Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-27258

Test performance and scalability of QHTTP

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Description

      A goal of this effort is to test the performance and scalability of qhttp - an embedded HTTP server for C++ applications: https://github.com/lsst/qserv/tree/master/core/modules/qhttp
      This server is extensively used by the Qserv Replication/Ingest System.

      The tests should include (as a minimum):

      • the number of simple requests processed each second for one client (this test would also test the server's routing to the request handlers)
      • the same metrics for many clients
      • the data throughput for large amounts of data conveyed into the server and received from the server back
      • study an effect of the number of BOOST ASIO threads on the performance and scalability of the server

      Tests will use the Python module requests on the client side.

      Results will be reported as comments to this ticket.

        Attachments

          Issue Links

            Activity

            Hide
            gapon Igor Gaponenko added a comment -

            A possible bug was found in the implementation of qhttp while working on the project. The bug was reported in DM-27396. A temporary workaround disabling connection reuse was temporarily deployed in the code of the current ticket.

            Show
            gapon Igor Gaponenko added a comment - A possible bug was found in the implementation of qhttp while working on the project. The bug was reported in DM-27396 . A temporary workaround disabling connection reuse was temporarily deployed in the code of the current ticket.
            Hide
            gapon Igor Gaponenko added a comment - - edited

            Testing methodology, tools, and a setup

            To facilitate testing the server, the following application was implemented:

            % qserv-replica-qhttp-test
            USAGE:
              <port> 
              --num-threads=[<value>]
              --report-interval-ms=[<value>]
              --debug
              --verbose
              --help
            

            The application will run the server with the specified number of threads (as requested in option --num-threads) and respond to requests on the following services:

            method service description
            GET /service/receive Receive data, send back a simple reply
            GET /service/echo Receive data, send back the same data
            GET /service/random Receive data, send back the random amount of data
            PUT /management/stop Receive data, gracefgully shut down the server

            All (but the last one, which won't have attribute {data}}) services will return JSON object of the following schema:

            {"succeess":1,
             "data":<string>
            }
            

            The server will count the total number of requests sent to all services, and the total amount of data received by the services and sent back to a client. The data will be sent in attribute data.
            Statistics on the number of requests and data received/sent will be reported by the server as specified in option --report-interval-ms (the default value of this parameter is 1000 ms). The report will be printed on the standard output.

            Here is an example of a command launching the server to listen on TCP port 25088:

            docker run --detach --rm --network host --name qserv-replica-qhttp-test \
               qserv/replica:tools-DM-27258 \
               lsst qserv-replica-qhttp-test 25088 --num-threads=1
            

            Logs can be traced by:

            2020-11-15 21:39:17.364  Process: 37 Req/s  Receive: 37888.04 KiB/s  Send: 37888.04 KiB/s
            2020-11-15 21:39:18.364  Process: 37 Req/s  Receive: 37888.04 KiB/s  Send: 37888.04 KiB/s
            2020-11-15 21:39:19.364  Process: 37 Req/s  Receive: 37888.04 KiB/s  Send: 37888.04 KiB/s
            2020-11-15 21:39:20.365  Process: 36 Req/s  Receive: 36864.04 KiB/s  Send: 36864.04 KiB/s
            2020-11-15 21:39:21.365  Process: 40 Req/s  Receive: 40960.04 KiB/s  Send: 40960.04 KiB/s
            2020-11-15 21:39:22.365  Process: 58 Req/s  Receive: 59392.06 KiB/s  Send: 59392.06 KiB/s
            2020-11-15 21:39:23.365  Process: 42 Req/s  Receive: 43008.04 KiB/s  Send: 43008.04 KiB/s
            2020-11-15 21:39:24.366  Process: 34 Req/s  Receive: 34816.03 KiB/s  Send: 34816.03 KiB/s
            

            The client application was written in Python using module requests.

            Running the test

            The server was run on node lsst-qserv-master03 (the main master node of the "small' cluster at NCSA). The clients were run as multiple processes on 7 nodes of the cluster:

            lsst-qserv-master03
            lsst-qserv-db31
            ..
            lsst-qserv-db36
            

            Show
            gapon Igor Gaponenko added a comment - - edited Testing methodology, tools, and a setup To facilitate testing the server, the following application was implemented: % qserv-replica-qhttp- test USAGE: <port> --num-threads=[<value>] --report-interval-ms=[<value>] --debug --verbose --help The application will run the server with the specified number of threads (as requested in option --num-threads ) and respond to requests on the following services: method service description GET /service/receive Receive data, send back a simple reply GET /service/echo Receive data, send back the same data GET /service/random Receive data, send back the random amount of data PUT /management/stop Receive data, gracefgully shut down the server All (but the last one, which won't have attribute {data}}) services will return JSON object of the following schema: {"succeess":1, "data":<string> } The server will count the total number of requests sent to all services, and the total amount of data received by the services and sent back to a client. The data will be sent in attribute data . Statistics on the number of requests and data received/sent will be reported by the server as specified in option --report-interval-ms (the default value of this parameter is 1000 ms ). The report will be printed on the standard output. Here is an example of a command launching the server to listen on TCP port 25088 : docker run --detach -- rm --network host --name qserv-replica-qhttp- test \ qserv /replica :tools-DM-27258 \ lsst qserv-replica-qhttp- test 25088 --num-threads=1 Logs can be traced by: 2020-11-15 21:39:17.364 Process: 37 Req /s Receive: 37888.04 KiB /s Send: 37888.04 KiB /s 2020-11-15 21:39:18.364 Process: 37 Req /s Receive: 37888.04 KiB /s Send: 37888.04 KiB /s 2020-11-15 21:39:19.364 Process: 37 Req /s Receive: 37888.04 KiB /s Send: 37888.04 KiB /s 2020-11-15 21:39:20.365 Process: 36 Req /s Receive: 36864.04 KiB /s Send: 36864.04 KiB /s 2020-11-15 21:39:21.365 Process: 40 Req /s Receive: 40960.04 KiB /s Send: 40960.04 KiB /s 2020-11-15 21:39:22.365 Process: 58 Req /s Receive: 59392.06 KiB /s Send: 59392.06 KiB /s 2020-11-15 21:39:23.365 Process: 42 Req /s Receive: 43008.04 KiB /s Send: 43008.04 KiB /s 2020-11-15 21:39:24.366 Process: 34 Req /s Receive: 34816.03 KiB /s Send: 34816.03 KiB /s The client application was written in Python using module requests . Running the test The server was run on node lsst-qserv-master03 (the main master node of the "small' cluster at NCSA). The clients were run as multiple processes on 7 nodes of the cluster: lsst-qserv-master03 lsst-qserv-db31 .. lsst-qserv-db36
            Hide
            gapon Igor Gaponenko added a comment - - edited

            Testing small requests

            In this case service /service/echo was called by many clients run in parallel. Each client was sending 1 byte of data. The tests were differentiated for the local (same host lsst-qserv-master03 where the server was being run) vs remote ({lsst-qserv-db[31-36]}} nodes).

            Results are presented in the table below:

            #threads #local_clients #remote_clients req/s notes
            1 1   880  
            1 2   1500  
            1 4   3000  
            1 8   5800  
            1 16   10000  
            1 16 16 13000 Started 16 clients on one remote host. Then stopped the tests after observing that the server has hit 100% CPU utilization, which is a limit of the 1 thread configuration. Restarted the server with 4 threads. Then resumed the tests.
            #threads #local_clients #remote_clients req/s notes
            4 16   10000
            4 16 16 17000 server CPU utilization 240%, 16 remote clients run on one node
            4 16 32 23500 server CPU utilization 310%, 32 remote clients run on 2 nodes in 2x16 configuration
            4 16 64 27500 server CPU utilization 340%, 64 remote clients run on 4 node in 4x16 configuration. Restarted the server with 8 threads.
            #threads #local_clients #remote_clients req/s notes
            8 16 64 21000 server CPU utilization 512%, 64 remote clients run on 4 node in 4x16 configuration.
            8 32 64 21000 server CPU utilization 512%, 64 remote clients run on 4 node in 4x16 configuration.

            Apparently, there no benefts in having more threads for serving multiple small requests.

            Show
            gapon Igor Gaponenko added a comment - - edited Testing small requests In this case service /service/echo was called by many clients run in parallel. Each client was sending 1 byte of data. The tests were differentiated for the local (same host lsst-qserv-master03 where the server was being run) vs remote ({lsst-qserv-db [31-36] }} nodes). Results are presented in the table below: #threads #local_clients #remote_clients req/s notes 1 1   880   1 2   1500   1 4   3000   1 8   5800   1 16   10000   1 16 16 13000 Started 16 clients on one remote host. Then stopped the tests after observing that the server has hit 100% CPU utilization, which is a limit of the 1 thread configuration. Restarted the server with 4 threads. Then resumed the tests. #threads #local_clients #remote_clients req/s notes 4 16   10000 4 16 16 17000 server CPU utilization 240%, 16 remote clients run on one node 4 16 32 23500 server CPU utilization 310%, 32 remote clients run on 2 nodes in 2x16 configuration 4 16 64 27500 server CPU utilization 340%, 64 remote clients run on 4 node in 4x16 configuration. Restarted the server with 8 threads. #threads #local_clients #remote_clients req/s notes 8 16 64 21000 server CPU utilization 512%, 64 remote clients run on 4 node in 4x16 configuration. 8 32 64 21000 server CPU utilization 512%, 64 remote clients run on 4 node in 4x16 configuration. Apparently, there no benefts in having more threads for serving multiple small requests.
            Hide
            gapon Igor Gaponenko added a comment - - edited

            Testing large requests

            In this case service /service/echo was called by many clients run in parallel. Each client was sending large amounts of data (see the number in the corresponding column of the table). The tests were differentiated for the local (same host lsst-qserv-master03 where the server was being run) vs remote (lsst-qserv-db[31-36] nodes).

            Request size: 1MB

            #threads #local_clients #remote_clients req/s sent_MB/s recv_MB/s notes
            1 1   30 32 32  
            1 4   91 93 93 100% CPU utilization by the server. Restarting the test with incresed number of server threads
            4 4   250 256 256 290% CPU utilization
            4 8   350 358 358 400% CPU utilization. Restarting the test with incresed number of server threads
            8 8   400 400 400 550% CPU utilization
            8 16   560 577 577 770% CPU utilization
            #threads #local_clients #remote_clients size req/s sent_MB/s recv_MB/s notes
            8 16 16 675 690 690 785% CPU utilization, added x16 clients on one remote node. Restating the test with incresed number of server threads
            16 16 16 1000 1020 1020 1500% CPU utilization. Restating the test with incresed number of server threads
            32 16 16 1030 1050 1050 1825% CPU utilization
            32 16 32 1275 1300 1300 2200% CPU utilization, 2x16 remote clients
            32 16 64 1480 1525 1525 2300% CPU utilization, 4x16 remote clients
            32 32 64 1370 1400 1400 2230% CPU utilization, 32 local clients

            Request size: 16MB

            #threads #local_clients #remote_clients req/s sent_MB/s recv_MB/s notes
            1 1   2 32 32  
            1 2   4 65 65 100% CPU utilization. Restating the test with incresed number of server threads
            4 4   8 130 130 200% CPU utilization
            4 8   16 260 260 350% CPU utilization
            4 16   18 300 300 380% CPU utilization. Restating the test with incresed number of server threads
            8 16   27 450 450 650% CPU utilization
            8 16 16 37 600 600 735% CPU utilization
            8 16 2x16 40 670 670 750% CPU utilization. Restating the test with incresed number of server threads
            16 16 2x16 53 880 880 1230% CPU utilization
            16 16 4x16 60 900 900 1350% CPU utilization
            16 16 6x16 60 990 990 1600% CPU utilization. Restating the test with incresed number of server threads
            32 16 6x16 72 1180 1180 1800% CPU utilization

            Request size: 128MB

            This is the final ts in this series. It was run with the large number of server threads and clients.

            #threads #local_clients #remote_clients req/s sent_MB/s recv_MB/s notes
            32 16 6x32 4 550 550  

            The locally run clients were reporting occasional disconnects and attempted retries of the requests. Apparently, the data packate is too large for the server to process efficiently. Perhaps, something else is going on on the client side.

            Show
            gapon Igor Gaponenko added a comment - - edited Testing large requests In this case service /service/echo was called by many clients run in parallel. Each client was sending large amounts of data (see the number in the corresponding column of the table). The tests were differentiated for the local (same host lsst-qserv-master03 where the server was being run) vs remote ( lsst-qserv-db [31-36] nodes). Request size: 1MB #threads #local_clients #remote_clients req/s sent_MB/s recv_MB/s notes 1 1   30 32 32   1 4   91 93 93 100% CPU utilization by the server. Restarting the test with incresed number of server threads 4 4   250 256 256 290% CPU utilization 4 8   350 358 358 400% CPU utilization. Restarting the test with incresed number of server threads 8 8   400 400 400 550% CPU utilization 8 16   560 577 577 770% CPU utilization #threads #local_clients #remote_clients size req/s sent_MB/s recv_MB/s notes 8 16 16 675 690 690 785% CPU utilization, added x16 clients on one remote node. Restating the test with incresed number of server threads 16 16 16 1000 1020 1020 1500% CPU utilization. Restating the test with incresed number of server threads 32 16 16 1030 1050 1050 1825% CPU utilization 32 16 32 1275 1300 1300 2200% CPU utilization, 2x16 remote clients 32 16 64 1480 1525 1525 2300% CPU utilization, 4x16 remote clients 32 32 64 1370 1400 1400 2230% CPU utilization, 32 local clients Request size: 16MB #threads #local_clients #remote_clients req/s sent_MB/s recv_MB/s notes 1 1   2 32 32   1 2   4 65 65 100% CPU utilization. Restating the test with incresed number of server threads 4 4   8 130 130 200% CPU utilization 4 8   16 260 260 350% CPU utilization 4 16   18 300 300 380% CPU utilization. Restating the test with incresed number of server threads 8 16   27 450 450 650% CPU utilization 8 16 16 37 600 600 735% CPU utilization 8 16 2x16 40 670 670 750% CPU utilization. Restating the test with incresed number of server threads 16 16 2x16 53 880 880 1230% CPU utilization 16 16 4x16 60 900 900 1350% CPU utilization 16 16 6x16 60 990 990 1600% CPU utilization. Restating the test with incresed number of server threads 32 16 6x16 72 1180 1180 1800% CPU utilization Request size: 128MB This is the final ts in this series. It was run with the large number of server threads and clients. #threads #local_clients #remote_clients req/s sent_MB/s recv_MB/s notes 32 16 6x32 4 550 550   The locally run clients were reporting occasional disconnects and attempted retries of the requests. Apparently, the data packate is too large for the server to process efficiently. Perhaps, something else is going on on the client side.
            Hide
            gapon Igor Gaponenko added a comment - - edited

            The summary and conclusions

            Positive:

            • The embedded has demonstrated excellent performance for handling small requests sent by multiple parallel clients.
            • The aggregate bandwidth of the server for handling parallel requests of 1MB and 16MB in size was also excellent.
            • The performance of the server scales as expected with the number of BOOST ASIO threads.
            • The server was stable during the tests. No single crash happened after processing many million requests.

            Problems:

            • The server didn't appear to do well for the very large request sizes (128MB was tested). The overall throughput was moderate. Clients were reporting occasional disconnects.
            Show
            gapon Igor Gaponenko added a comment - - edited The summary and conclusions Positive: The embedded has demonstrated excellent performance for handling small requests sent by multiple parallel clients. The aggregate bandwidth of the server for handling parallel requests of 1MB and 16MB in size was also excellent. The performance of the server scales as expected with the number of BOOST ASIO threads. The server was stable during the tests. No single crash happened after processing many million requests. Problems: The server didn't appear to do well for the very large request sizes ( 128MB was tested). The overall throughput was moderate. Clients were reporting occasional disconnects.
            Hide
            fritzm Fritz Mueller added a comment -

            Please just add the Jira ticket number for the reuse-socket issue to the comment next to the commented-out code, so somebody reading through can find more detail easily. Otherwise, looks fine – thanks!

            Show
            fritzm Fritz Mueller added a comment - Please just add the Jira ticket number for the reuse-socket issue to the comment next to the commented-out code, so somebody reading through can find more detail easily. Otherwise, looks fine – thanks!

              People

              Assignee:
              gapon Igor Gaponenko
              Reporter:
              gapon Igor Gaponenko
              Reviewers:
              Fritz Mueller
              Watchers:
              Fritz Mueller, Igor Gaponenko, Nate Pease
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: