Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-25980

Increased connection timeout for the backend service in a configuration of mysql-proxy

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Description

      During a stress testing of Qserv it was observed that mysql-proxy may report failures to connect to its backend MySQL server and report the following to its clients:

      ERROR 1105 (HY000): (proxy) all backends are down
      

      Typically, this is seen when the number of clients exceeds a certain limit. Specifically, it has been observed (and reproduced on many occasions) that 2 out of 40 simultaneously launched query processing requests would instantly fail with the above shown error.

      Further analysis of the log file (done by Andy Salnikov) has revealed the following message in the proxy's log file:

      2020-07-11 01:59:18: (debug) proxy-plugin.c:229: connecting to 127.0.0.1:3306 timed out after 2.00 seconds. Trying another backend.
      2020-07-11 01:59:18: (critical) proxy-plugin.c.1869: Cannot connect, all backends are down.
      

      According to the documentation portal for the proxy, there is a command line option which allows increasing the timeout:

      --proxy-connect-timeout=<seconds>
      

      See: https://downloads.mysql.com/docs/mysql-proxy-relnotes-en.pdf

      Hence, a goal of this development is to investigate if setting the following value of this parameter when starting the proxy would solve the problem:

      --proxy-connect-timeout=30
      

      Deploy the updated version of Qserv at the development cluster at NCSA and test it. Report results and limitations in this ticket.

        Attachments

          Issue Links

            Activity

            Hide
            gapon Igor Gaponenko added a comment - - edited

            In a preliminary study of a potential usefulness of this parameter, a large scale test was conducted and reported in a context of DM-25891. According to the test, setting a value of the parameter to 30 allowed to launch and successfully up to 80 simultaneous "shared scan" queries w/o seeing errors reported in the Description section of the current ticket.

            Show
            gapon Igor Gaponenko added a comment - - edited In a preliminary study of a potential usefulness of this parameter, a large scale test was conducted and reported in a context of DM-25891 . According to the test, setting a value of the parameter to 30 allowed to launch and successfully up to 80 simultaneous "shared scan" queries w/o seeing errors reported in the Description section of the current ticket.
            Show
            gapon Igor Gaponenko added a comment - PR: https://github.com/lsst/qserv/pull/555
            Hide
            salnikov Andy Salnikov added a comment -

            Show
            salnikov Andy Salnikov added a comment -

              People

              Assignee:
              gapon Igor Gaponenko
              Reporter:
              gapon Igor Gaponenko
              Reviewers:
              Andy Salnikov
              Watchers:
              Andy Salnikov, Fritz Mueller, Igor Gaponenko, Nate Pease
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: