Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-26530

Improved batch mode of ingesting table contributions into Qserv

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Description

      The current implementation of the Ingest system has a binary tool qserv-replica-file-ingest providing the batch mode for loading a list of contributions (read from input files) into various tables:

      qserv-replica-file-ingest FILE-LIST <contributions> [--auth-key=<key>] ...
      

      Where, the mandatory parameter <contributions> requires a JSON-formatted text file with an array of file contributions into potentially any tables using multiple transactions. Here is an example of such file:

      [{"worker-host":"qserv-db01","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187107_overlap.txt"},
       {"worker-host":"qserv-db01","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187107.txt"},
       {"worker-host":"qserv-db02","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187108_overlap.txt"},
       {"worker-host":"qserv-db02","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187108.txt"},
       {"worker-host":"qserv-db01","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187109_overlap.txt"},
       {"worker-host":"qserv-db01","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187109.txt"},
       {"worker-host":"qserv-db02","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187110_overlap.txt"},
       {"worker-host":"qserv-db02","worker-port":25002,"transaction-id":123,"table":"Object","type":"P","path":"input/chunk_187110.txt"}
      ]
      

      Although this method provides a lot of flexibility in mixing contributions into any tables, using different transactions, it creates a few inconveniences for implementing ingest workflows. One of those is cased by encoding transaction identifiers into the file. If, for some reason, a transaction fails ad has to be restarted, the previous version of the table contributions files becomes invalid and it needs to be regenerated to encompass a new transaction. Another problem of mixing transactions in the same file would require aborting all transactions mentioned in the file if any contribution fails to be ingested. And finally, it's impractical in any realistic workflows to mix contributions into different tables in the same file.

      The proposed effort is meant to keep the current batch method, and to add another one which addresses the above mentioned issues:

      qserv-replica-file-ingest FILE-LIST-TRANS <transaction-id> <table> <type> <contributions> [--auth-key=<key>] ...
      

      Where the <contributions> file will have a fewer number of attributes in each element of the JSON array. For example:

      [{"worker-host":"qserv-db01","worker-port":25002,"path":"input/chunk_187107_overlap.txt"},
       {"worker-host":"qserv-db01","worker-port":25002,"path":"input/chunk_187107.txt"},
       {"worker-host":"qserv-db02","worker-port":25002,"path":"input/chunk_187108_overlap.txt"},
       {"worker-host":"qserv-db02","worker-port":25002,"path":"input/chunk_187108.txt"},
       {"worker-host":"qserv-db01","worker-port":25002,"path":"input/chunk_187109_overlap.txt"},
       {"worker-host":"qserv-db01","worker-port":25002,"path":"input/chunk_187109.txt"},
       {"worker-host":"qserv-db02","worker-port":25002,"path":"input/chunk_187110_overlap.txt"},
       {"worker-host":"qserv-db02","worker-port":25002,"path":"input/chunk_187110.txt"}
      ]
      

      This file would produce the same yield as the existing one if used like illustrated below:

      qserv-replica-file-ingest FILE-LIST-TRANS 123 Object P <contributions>
      

      Advantages of the new batch mode:

      • the file of contributions is reusable across transactions
      • a scope of failures during ingest is limited to a single transactions

      Destinations where the tables contributions are meant to be ingested do not change after aborting and restarting transactions.

        Attachments

          Issue Links

            Activity

            Show
            gapon Igor Gaponenko added a comment - John Gates PR: https://github.com/lsst/qserv/pull/568
            Hide
            jgates John Gates added a comment -

            It looks good to me.

            Show
            jgates John Gates added a comment - It looks good to me.

              People

              Assignee:
              gapon Igor Gaponenko
              Reporter:
              gapon Igor Gaponenko
              Reviewers:
              John Gates
              Watchers:
              Fritz Mueller, Igor Gaponenko, John Gates, Nate Pease
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: