Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-28606

Retrying connections to the XROOTD services from the Qserv Replication system

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Description

      The problem

      The current implementation of the Replication/Ingest system is not tolerant to communication failures that may happen when the system is trying to establish a connection with the XRootD service. This may cause problems in the Kubernetes-based deployments of the system if no DNS entry exists for the XRootD pod. Should this happen the Master Replication Controller would report the following message and exit:

      2021-02-01T16:00:22.170Z  LWP 216   ERROR  QservMgtServices::_xrdSsiService  failed to contact service provider at: qserv-xrootd-redirector:1094, error: Unable to validate contact; Name or service not known
      

      A solution

      Reinforce the implementation of class lsst::qserv::replica::QservMgtServices to make more reconnection attempts. Limit the number of attempts by a timeout specified via the Configuration service of the REplication/Ingest system.

        Attachments

          Activity

          Show
          gapon Igor Gaponenko added a comment - PR: https://github.com/lsst/qserv/pull/600
          Hide
          jgates John Gates added a comment -

          Looks good, just some style comments.

          Show
          jgates John Gates added a comment - Looks good, just some style comments.

            People

            Assignee:
            gapon Igor Gaponenko
            Reporter:
            gapon Igor Gaponenko
            Reviewers:
            John Gates
            Watchers:
            Fabrice Jammes, Fritz Mueller, Igor Gaponenko, John Gates, Nate Pease
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: