# Reinforce table management operations in the Ingest worker services

XMLWordPrintable

#### Details

• Type: Improvement
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
5
• Sprint:
DB_F20_09
• Team:
Data Access and Database

# Objectives

In the current implementation of the worker ingest services in Qserv, it's the responsibility of the workers to ensure a destination table exists and has a required MySQL partition before loading contributions into the table (the target MySQL partition). Therefore, each time the worker thread begins processing a table contribution request the first actions of the thread would be to:

 CREATE IF NOT EXISTS LIKE ; ALTER TABLE ADD PARTITION IF NOT EXISTS ...; 

Where, the prototype-table is already defined as the MySQL partitioned table, and it has one default partition corresponding to transactionId=0 (Note, each "super-transaction" in the Qserv Ingest system maps to a unique MySQL partition).
These statements are followed by the table data loading operation LOAD DATA INFILE ... into the new table.
An assumption behind this implementation was that each of the above-mentioned table management operations is atomic. Unfortunately, this may not be the case for the MyISAM engine of MySQL. It's been observed that when the ingest service is put under a very high load (high frequency of low size contributions into thousands of chunks while having hundreds or thousands of "super-transactions" open) the worker may fail at one of the table management operations:
Reported by Client A:

 mysql_real_query failed, query: 'ALTER TABLE fakedb.ForcedSource_1609 ADD PARTITION IF NOT EXISTS (PARTITION p215 VALUES IN (215))',  error: Partition management on a not partitioned table is not possible, errno: 1505\nmain() the application failed,  exception: IngestClient::send[lsst-qserv-db17:25002] data send,  server error: data load failed: Connection[233]::execute(_inTransaction=1)  mysql_real_query failed, query: 'ALTER TABLE fakedb.ForcedSource_1609 ADD PARTITION IF NOT EXISTS (PARTITION p215 VALUES IN (215))',  error: Partition management on a not partitioned table is not possible, errno: 1505\n" 

Reported by Client B:

 mysql_real_query failed, query: \'ALTER TABLE fakedb.ForcedSource_1614 ADD PARTITION IF NOT EXISTS (PARTITION p214 VALUES IN (214))\',  error: Can\'t find file: \'./fakedb/ForcedSource_1614.MYI\' (errno: 2 "No such file or directory"), errno: 1017\nmain() the application failed,  exception: IngestClient::send[lsst-qserv-db16:25002] data send,  server error: data load failed: Connection[219]::execute(_inTransaction=1)  mysql_real_query failed, query: \'ALTER TABLE fakedb.ForcedSource_1614 ADD PARTITION IF NOT EXISTS (PARTITION p214 VALUES IN (214))\',  error: Can\'t find file: \'./fakedb/ForcedSource_1614.MYI\' (errno: 2 "No such file or directory"), errno: 1017\n' 

While an overall probability of this happening is relatively low, the service implementation still needs to be reinforced.

## The proposed solution to the problem

Since it would be hard to track and eliminate the problem at a level of MySQL then the most reasonable approach would be to reinforce the code of the Ingests system. The new version of the code proposed in this ticket would put the above-mentioned table management operations into a critical section specific to a given database, table, and chunk. This will guarantee that only one thread at a time can modify the schema of a specific table while allowing such operations on different tables to be done in parallel.

#### Activity

Hide
Igor Gaponenko added a comment -
Show
Igor Gaponenko added a comment - PR: https://github.com/lsst/qserv/pull/582
Hide
John Gates added a comment -

There's some code duplication and there may be an issue with large numbers of mutexes being created over the lifetime of the program and never being destroyed. Otherwise, it looks good.

Show
John Gates added a comment - There's some code duplication and there may be an issue with large numbers of mutexes being created over the lifetime of the program and never being destroyed. Otherwise, it looks good.
Hide
Igor Gaponenko added a comment -

For the record, all suggestions have been considered, evaluated and implemented.

Show
Igor Gaponenko added a comment - For the record, all suggestions have been considered, evaluated and implemented.

#### People

Assignee:
Igor Gaponenko
Reporter:
Igor Gaponenko
Reviewers:
John Gates
Watchers:
Fabrice Jammes, Fritz Mueller, Igor Gaponenko, John Gates, Nate Pease