Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-28183

Ingesting table contributions via named pipes at Qserv Ingest workers



    • Type: Improvement
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:



      In the current implementation of the Ingest system, the input files pulled from an object store (or a locally mounted filesystem) get temporarily stored on the same filesystem where Qserv worker has its data (MySQL). Because of that there one additional write (to store the input file) and read (to pull the file and sent it to MySQL for further ingest). The temporary file is needed here because the input files (pulled from an object store) get extended with an extra column carrying transaction identifiers ... and potentially for sanitizing the payload of the files. Therefore, for each input file, there are at least 3 disk operations:

      1. write
      2. read
      3. write

      It means that for 150 MB/s of the network I/O into a worker node there will be (at least) 450 MB/s of the disk I/O. For a typical disk bandwidth of 1 GB/s (for streaming I/O) of an HDD-based data RAID, this may limit the aggregate performance of the worker by 150 MB/s.

      Proposed solution

      The implementation can (and must) be improved to write the temporary files to named pipes (file-like buffers in memory). That would eliminate that extra 1. write and 2. read steps and improve the performance of the system.

      Other actions

      • Test the performance improvements during the large-scale ingest into the "large" Qserv cluster at NCSA. Report observations in comments to this ticket.
      • Extend the client-worker protocol (REST API) to allow a client to specify an option selecting the desired behavior (locally cached files or pipes). The default behavior would be (probably) the pipes.
      • Document changes made to the REST API.



          There are no comments yet on this issue.


            gapon Igor Gaponenko
            gapon Igor Gaponenko
            Fritz Mueller, Igor Gaponenko, Nate Pease
            0 Vote for this issue
            3 Start watching this issue



                CI Builds

                No builds found.