# Increased connection timeout for the backend service in a configuration of mysql-proxy

XMLWordPrintable

#### Details

• Type: Bug
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
2
• Sprint:
DB_F20_06
• Team:
Data Access and Database
• Urgent?:
No

#### Description

During a stress testing of Qserv it was observed that mysql-proxy may report failures to connect to its backend MySQL server and report the following to its clients:

 ERROR 1105 (HY000): (proxy) all backends are down 

Typically, this is seen when the number of clients exceeds a certain limit. Specifically, it has been observed (and reproduced on many occasions) that 2 out of 40 simultaneously launched query processing requests would instantly fail with the above shown error.

Further analysis of the log file (done by Andy Salnikov) has revealed the following message in the proxy's log file:

 2020-07-11 01:59:18: (debug) proxy-plugin.c:229: connecting to 127.0.0.1:3306 timed out after 2.00 seconds. Trying another backend. 2020-07-11 01:59:18: (critical) proxy-plugin.c.1869: Cannot connect, all backends are down. 

According to the documentation portal for the proxy, there is a command line option which allows increasing the timeout:

 --proxy-connect-timeout= 

Hence, a goal of this development is to investigate if setting the following value of this parameter when starting the proxy would solve the problem:

 --proxy-connect-timeout=30 

Deploy the updated version of Qserv at the development cluster at NCSA and test it. Report results and limitations in this ticket.

#### Activity

Hide
Igor Gaponenko added a comment - - edited

In a preliminary study of a potential usefulness of this parameter, a large scale test was conducted and reported in a context of DM-25891. According to the test, setting a value of the parameter to 30 allowed to launch and successfully up to 80 simultaneous "shared scan" queries w/o seeing errors reported in the Description section of the current ticket.

Show
Igor Gaponenko added a comment - - edited In a preliminary study of a potential usefulness of this parameter, a large scale test was conducted and reported in a context of DM-25891 . According to the test, setting a value of the parameter to 30 allowed to launch and successfully up to 80 simultaneous "shared scan" queries w/o seeing errors reported in the Description section of the current ticket.
Hide
Igor Gaponenko added a comment -
Show
Igor Gaponenko added a comment - PR: https://github.com/lsst/qserv/pull/555
Hide
Andy Salnikov added a comment -

Show
Andy Salnikov added a comment -

#### People

Assignee:
Igor Gaponenko
Reporter:
Igor Gaponenko
Reviewers:
Andy Salnikov
Watchers:
Andy Salnikov, Fritz Mueller, Igor Gaponenko, Nate Pease