Fix Version/s: None
Team:Data Access and Database
Catalog ingest workflows starting multiple simultaneous (super-)transactions may trigger the following error condition at Ingest workers (as reported by the ingest clients):
At the same time, the MySQL server reports:
- The problem seem to affect the specific query.
- The problem is reproducible.
- Once a client connection gets into this state it never recovers from it (that points onto a possible bug in the error handling in the Replication system's MySQL connection management).
Reportedly, this code is typically seen when one of the timeout controlling parameters of MySQL server are two low. These were the values of the relevant parameters captured when the errors were detected:
It's been recommended to increase values of the parameters connect_timeout, net_read_timeout, and net_write_timeout.
An attempt was made to increase values of the parameters at the MariaDB container's startup sequence:
Unfortunately, this didn't help.
Revisit error processing in the class lsst::qserv::replica::Connection to see if lost connections are properly re-established in case of failures.
The code looks good.
The bug was found in the recently added support for transaction contribution. The bug was rather trivial and obvious.