Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-28006

Improved error reporting in the ingest worker for failures when pulling files from HTTP/HTTPS servers

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Description

      Goals

      This ticket makes a further improvement on top of what was implemented in DM-27772.

      Reporting HTTP codes for failures to pull files from an object store

      The current implementation of the worker Ingests service uses libcurl to pull files from remote object-stores via the HTTP/HTTPS protocol. The implementation won't make precise error reporting (like including a specific HTTP code returned by a server) should any problem happened when pulling data from the server. All error codes at or above {{400} are treated as errors and reported into the worker's log and back to a requestor as:

      operation failed due to: curl_easy_perform() failed, error: 'HTTP response code said error (on HTTP error codes 400 or greater)', errnum: 22
      

      For some clients, this degree of reporting may not be sufficient enough. Hence, the goal of the proposed effort is to improve the implementation to extract a specific error code to be returned to the Ingest workflows requested the file transfer.

      Specifically, the new code will invoke the following operation after each call to curl_easy_perform:

      long code;
      curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &code);
      

      Any error code that would differ from 200 would be included into the error message to be logged by a worker and returned back to the client wokflow. The code should be included into the extended dictionary error_ext of the JSON object:

      {"success":0,
       "error":<string>,
       "error_ext":{
         "http_error":<number>,
         "system_error":<number>,
         ..
       }
      }
      

      NOTE: the extended error dictionary is optional for requests where a reson of a problem could be determined.
      More info on this subject can be found at https://ec.haxx.se/libcurl-http/libcurl-http-responses

      Differentiating failures

      Sometimes, a failure in pulling a file from an object store (or a locally mounted filesystem) won't require restarting a transaction as no permanent changes are made to the destination table. In this case, an ingest workflow may benefit from knowing that fact and repeat ingesting the same contribution w/o restarting the whole transaction (which could be, depending on the amount of data ingested so far, quite expensive). Hence, the proposed mprovement is to further extend the dictionary error_ext with flag retry_allowed:

      {"success":0,
       "error":<string>,
       "error_ext":{
         "retry_allowed":<number>,
         ..
       }
      }
      

      Any value of the flag that would differ from 0 would indicate that it's safe to repeat the file ingest w/o restarting a transaction.

      Other notes

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            I approved PR but check my comments, I'm not sure that I understand what "retry" means in your case.

            Show
            salnikov Andy Salnikov added a comment - I approved PR but check my comments, I'm not sure that I understand what "retry" means in your case.

              People

              Assignee:
              gapon Igor Gaponenko
              Reporter:
              gapon Igor Gaponenko
              Reviewers:
              Andy Salnikov
              Watchers:
              Andy Salnikov, Fabrice Jammes, Fritz Mueller, Igor Gaponenko, Nate Pease
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.