Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-28006

Improved error reporting in the ingest worker for failures when pulling files from HTTP/HTTPS servers



    • Type: Improvement
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:



      This ticket makes a further improvement on top of what was implemented in DM-27772.

      Reporting HTTP codes for failures to pull files from an object store

      The current implementation of the worker Ingests service uses libcurl to pull files from remote object-stores via the HTTP/HTTPS protocol. The implementation won't make precise error reporting (like including a specific HTTP code returned by a server) should any problem happened when pulling data from the server. All error codes at or above {{400} are treated as errors and reported into the worker's log and back to a requestor as:

      operation failed due to: curl_easy_perform() failed, error: 'HTTP response code said error (on HTTP error codes 400 or greater)', errnum: 22

      For some clients, this degree of reporting may not be sufficient enough. Hence, the goal of the proposed effort is to improve the implementation to extract a specific error code to be returned to the Ingest workflows requested the file transfer.

      Specifically, the new code will invoke the following operation after each call to curl_easy_perform:

      long code;
      curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &code);

      Any error code that would differ from 200 would be included into the error message to be logged by a worker and returned back to the client wokflow. The code should be included into the extended dictionary error_ext of the JSON object:


      NOTE: the extended error dictionary is optional for requests where a reson of a problem could be determined.
      More info on this subject can be found at https://ec.haxx.se/libcurl-http/libcurl-http-responses

      Differentiating failures

      Sometimes, a failure in pulling a file from an object store (or a locally mounted filesystem) won't require restarting a transaction as no permanent changes are made to the destination table. In this case, an ingest workflow may benefit from knowing that fact and repeat ingesting the same contribution w/o restarting the whole transaction (which could be, depending on the amount of data ingested so far, quite expensive). Hence, the proposed mprovement is to further extend the dictionary error_ext with flag retry_allowed:


      Any value of the flag that would differ from 0 would indicate that it's safe to repeat the file ingest w/o restarting a transaction.

      Other notes


          Issue Links


            salnikov Andy Salnikov added a comment -

            I approved PR but check my comments, I'm not sure that I understand what "retry" means in your case.

            salnikov Andy Salnikov added a comment - I approved PR but check my comments, I'm not sure that I understand what "retry" means in your case.


              gapon Igor Gaponenko
              gapon Igor Gaponenko
              Andy Salnikov
              Andy Salnikov, Fabrice Jammes, Fritz Mueller, Igor Gaponenko, Nate Pease
              0 Vote for this issue
              5 Start watching this issue



                  CI Builds

                  No builds found.