Details
-
Type:
Improvement
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: Qserv
-
Labels:None
-
Story Points:3
-
Epic Link:
-
Sprint:DB_S21_12
-
Team:Data Access and Database
Description
Goals
this ticket depends on
DM-26101.
The performance of many critical "jobs" (algorithms) in the current implementation in the Replication/Ingest system is O(N_transactions x N_allocated_chunks. This is not very efficient if the number of chunks is large. This implementation was made at a time when the Master Replication Controller didn't have any knowledge on the actual contributions into chunks at workers. Hence, it had to assume the worst-case scenario that there was at least one contribution into each allocated chunk within a transaction. This effect could be demonstrated with the following plot that captures the performance of the transaction commit algorithm as a function of the number of transactions (and allocated chunks). Each transaction shown on the plot was contributing into 100 chunks spread across 30 workers:
In reality, the <N_transactions> x <N_allocated_chunks> matrix is rather_sparse_ (diagonal), and is not being fully populated. Usually, only a few chunks receive contributions within a given transaction. Fortunately, the recent addition to the system (see DM-26101 that adds the bookkeeping mechanism for contributions made at workers during transactions) allows for a better implementation of the algorithms. The corresponding tables are now seen by the Master Controller. A goal of this effort is to use data from the bookkeeping table to limit the scope of the corresponding algorithms to cover affected chunks (or regular tables) only.
PR: https://github.com/lsst/qserv/pull/602