Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-10366

Plan scaling experiments for KPM scripts

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None

      Description

      Investigate further into splitting KPM scripts to handle large query result scaling through either the python mysql library or mysql shell client. Collect lessons learned and methods on how to approach the solution for F17 tests.

        Attachments

          Issue Links

            Activity

            Hide
            vaikunth Vaikunth Thukral added a comment -

            The KPM testing module used for the F16 large scale tests had significant changes done to it in order to handle the new, larger amount of data. In this ticket we note down all lessons learned and items identified as problematic or needing updating in order to handle all future KPM tests. At the core of this issue are the following points:

            1. Concurrent myql connection handling
            2. Large query result handling
            3. Limitations of python threading and the python mysql library
            4. Different cursor object classes in the python mysql library with different query/result/intermediate-data handling

            We elaborate on all 4 points here and discuss potential long term solutions to plan out KPM work for F17 and beyond.

            Show
            vaikunth Vaikunth Thukral added a comment - The KPM testing module used for the F16 large scale tests had significant changes done to it in order to handle the new, larger amount of data. In this ticket we note down all lessons learned and items identified as problematic or needing updating in order to handle all future KPM tests. At the core of this issue are the following points: 1. Concurrent myql connection handling 2. Large query result handling 3. Limitations of python threading and the python mysql library 4. Different cursor object classes in the python mysql library with different query/result/intermediate-data handling We elaborate on all 4 points here and discuss potential long term solutions to plan out KPM work for F17 and beyond.
            Hide
            vaikunth Vaikunth Thukral added a comment - - edited

            1. Concurrent mysql connection handling

            The KPM script used the following code block to generate a database connection to the mysql proxy in each thread for the queries being tested. This could be up to 100threads/connections that are being attempted at the same instant:

            conn = MySQLdb.connect(host=master,
                                   port=4040,
                                   user='qsmaster',
                                   passwd='',
                                   db='LSST20')
            cursor = conn.cursor()
            

            Frequently, a few out of the ~100 connections would fail to establish, so we fixed it with a try/except clause:

            try:
                conn = MySQLdb.connect(host=master,
                                       port=4040,
                                       user='qsmaster',
                                       passwd='',
                                       db='LSST20',
                                       cursorclass=MySQLdb.cursors.SSCursor)
             
            except MySQLdb.Error as exc:
                logging.info("Could not create connection, retrying: %s", exc)
                time.sleep(1)
                continue
            

            This was an easy update and it immediately fixed the issue. All threads were able to start correctly and continue to submit further queries and maintain their connection.

            Show
            vaikunth Vaikunth Thukral added a comment - - edited 1. Concurrent mysql connection handling The KPM script used the following code block to generate a database connection to the mysql proxy in each thread for the queries being tested. This could be up to 100threads/connections that are being attempted at the same instant: conn = MySQLdb.connect(host = master, port = 4040 , user = 'qsmaster' , passwd = '', db = 'LSST20' ) cursor = conn.cursor() Frequently, a few out of the ~100 connections would fail to establish, so we fixed it with a try/except clause: try : conn = MySQLdb.connect(host = master, port = 4040 , user = 'qsmaster' , passwd = '', db = 'LSST20' , cursorclass = MySQLdb.cursors.SSCursor)   except MySQLdb.Error as exc: logging.info( "Could not create connection, retrying: %s" , exc) time.sleep( 1 ) continue This was an easy update and it immediately fixed the issue. All threads were able to start correctly and continue to submit further queries and maintain their connection.
            Hide
            vaikunth Vaikunth Thukral added a comment - - edited

            2. Large query result handling

            While KPM tests were running and executing their respective and iterative queries within each thread, there was an obvious bottleneck observed during large results of certain queries. While the typical "Low Volume" queries were running as intended within the right time frame, among the "High Volume" queries were shared scans on the Object and Source tables, including JOINs within these tables. Since these queries are scheduled concurrently for each table, a certain scan (same as used in the S15 tests) on the Object table was resulting in a very large result set size:

            SELECT ra, decl, u_psfFlux, g_psfFlux, r_psfFlux FROM Object WHERE y_shapeIxx BETWEEN 20 AND 40
            

            While the system is able to handle this query individually and even with a lighter load, the conjunction of this query in the stress test and in the scenario of planned operation yielded a major issue of returning large results while other smaller/medium sized results from other concurrent scans are also being returned. This query had a result size of ~6GB, and it appeared that other results from other queries would lead to some memory limit of handling intermediate result stream data with the python mysql library and eventually have the OS invoke the OOM killer on the running process, hence ending the testing script. This main issue lead to further investigation as discussed in the next points.

            Note: This issue does not have a solution yet, and it was circumvented for F16 tests by reducing the query coverage from BETWEEN 20 and 40 to BETWEEN 20 and 20.2

            Show
            vaikunth Vaikunth Thukral added a comment - - edited 2. Large query result handling While KPM tests were running and executing their respective and iterative queries within each thread, there was an obvious bottleneck observed during large results of certain queries. While the typical "Low Volume" queries were running as intended within the right time frame, among the "High Volume" queries were shared scans on the Object and Source tables, including JOINs within these tables. Since these queries are scheduled concurrently for each table, a certain scan (same as used in the S15 tests) on the Object table was resulting in a very large result set size: SELECT ra, decl, u_psfFlux, g_psfFlux, r_psfFlux FROM Object WHERE y_shapeIxx BETWEEN 20 AND 40 While the system is able to handle this query individually and even with a lighter load, the conjunction of this query in the stress test and in the scenario of planned operation yielded a major issue of returning large results while other smaller/medium sized results from other concurrent scans are also being returned. This query had a result size of ~6GB, and it appeared that other results from other queries would lead to some memory limit of handling intermediate result stream data with the python mysql library and eventually have the OS invoke the OOM killer on the running process, hence ending the testing script. This main issue lead to further investigation as discussed in the next points. Note: This issue does not have a solution yet, and it was circumvented for F16 tests by reducing the query coverage from BETWEEN 20 and 40 to BETWEEN 20 and 20.2
            Hide
            vaikunth Vaikunth Thukral added a comment -

            3. Limitations of python threading and the python mysql library

            As noted above, concurrent queries are submitted in many threads in the KPM testing module. This is done with the python threading library. Each thread then manages multiple iterations of the same type of query. When these queries return large results all at the same time, poor memory management on behalf of the less-than-optimum python threading model could be a likely cause of the OOM killer being invoked on the process. In order to test this theory of the python mysql library running poorly under python threading, we tested switching the KPM module to use the mysql shell client instead of the python client:

            shell_cmd = "mysql --host ccqserv125 --port 4040 --user qsmaster {0} -e \"{1};\"".format(dbName, q)
            cmd = shlex.split(shell_cmd)
             
            f = open(outfile, 'a')
            p1 = subprocess.Popen(cmd, stdout=f)
            logging.info("Point B in thread %s: Waiting for query", threadId)
            p1.wait()
            f.close()
            

            Since a separate test was done with a simple wrapper shell script just running all ~100 queries concurrently with the mysql shell client successfully (with no issues with large result set returns), the idea was to replicate that with the automated KPM script in order to run the multiple iterations we need for the test. However, we ran into similar and other issues as with the python mysql library. Primarily, as noted in the code block above, we ran each query in its own thread with a new connection every iteration rather than run multiple queries with the same established connection. We also used the Popen method without shell=TRUE to avoid any subshell spawning and environment setting overhead issues. This also ended up not working, as it still looked like doing this via the poorly managed python threading caused slowdowns and bottlenecks that would require much further work to investigate and debug. The slowdowns were noted regardless of whether the output was redirected to an output file (what we need) vs. if we just dumped it to /dev/null.

            Show
            vaikunth Vaikunth Thukral added a comment - 3. Limitations of python threading and the python mysql library As noted above, concurrent queries are submitted in many threads in the KPM testing module. This is done with the python threading library. Each thread then manages multiple iterations of the same type of query. When these queries return large results all at the same time, poor memory management on behalf of the less-than-optimum python threading model could be a likely cause of the OOM killer being invoked on the process. In order to test this theory of the python mysql library running poorly under python threading, we tested switching the KPM module to use the mysql shell client instead of the python client: shell_cmd = "mysql --host ccqserv125 --port 4040 --user qsmaster {0} -e \"{1};\"" . format (dbName, q) cmd = shlex.split(shell_cmd) f = open (outfile, 'a' ) p1 = subprocess.Popen(cmd, stdout = f) logging.info( "Point B in thread %s: Waiting for query" , threadId) p1.wait() f.close() Since a separate test was done with a simple wrapper shell script just running all ~100 queries concurrently with the mysql shell client successfully (with no issues with large result set returns), the idea was to replicate that with the automated KPM script in order to run the multiple iterations we need for the test. However, we ran into similar and other issues as with the python mysql library. Primarily, as noted in the code block above, we ran each query in its own thread with a new connection every iteration rather than run multiple queries with the same established connection. We also used the Popen method without shell=TRUE to avoid any subshell spawning and environment setting overhead issues. This also ended up not working, as it still looked like doing this via the poorly managed python threading caused slowdowns and bottlenecks that would require much further work to investigate and debug. The slowdowns were noted regardless of whether the output was redirected to an output file (what we need) vs. if we just dumped it to /dev/null .
            Hide
            vaikunth Vaikunth Thukral added a comment -

            4. Different cursor object classes in the python mysql library with different query/result/intermediate-data handling

            The last main issue regarding output size and result handling is with the cursor classes in the python mysql library. While the original testing module did not specify a cursor class, the python module uses a default class that has its own way of handling query results, as mentioned in the official documentation: "This is the standard Cursor class that returns rows as tuples and stores the result set in the client."

            Although many other cursor classes exist, we decided to switch to cursorclass=MySQLdb.cursors.SSCursor - "This is a Cursor class that returns rows as tuples and stores the result set in the server." Since the output issues seem to be related to result handling by the mysql proxy client and not necessarily Qserv's ability itself, the class that stored results on the server side would be the ideal one to use.

            The next issue relating to output handling is fetching of the result data. With the default cursor class and the default fetching method, we would see the issues as discussed in this ticket. In the code it looks like:

            cursor = conn.cursor()
            cursor.execute(q)
            rows = cursor.fetchall()
            for row in rows:                                                                                                                                                       
                for col in row:                                                                                                                                                    
                f.write("%s, " % col)                                                                                                                                          
                f.write("\n") 
            

            This would fetch all the results for individual queries at once and attempt to write them to disk concurrently for all running threads. This is another indication of a bottleneck causing the OOM killer to stop the process while returning data (as was observed during the tests, it is during fetching/writing of the large results to disk that the tests failed).

            Instead, we used the aforementioned new cursor class to make the connection object, and then fetch results in a more manageable fashion with fetchmany() instead of fetchall():

            cursor = conn.cursor()
            cursor.execute(q)
            while True:
                rows = cursor.fetchmany(10000)
                if rows == ():
                    break
                for row in rows:
                    f.write(", ".join([str(col) for col in row]))
                    f.write("\n")
            

            The combination of all these changes and observations lead to the KPM tests being able to run successfully for the 20% DR1 level.

            Show
            vaikunth Vaikunth Thukral added a comment - 4. Different cursor object classes in the python mysql library with different query/result/intermediate-data handling The last main issue regarding output size and result handling is with the cursor classes in the python mysql library. While the original testing module did not specify a cursor class, the python module uses a default class that has its own way of handling query results, as mentioned in the official documentation: "This is the standard Cursor class that returns rows as tuples and stores the result set in the client." Although many other cursor classes exist , we decided to switch to cursorclass=MySQLdb.cursors.SSCursor - "This is a Cursor class that returns rows as tuples and stores the result set in the server." Since the output issues seem to be related to result handling by the mysql proxy client and not necessarily Qserv's ability itself, the class that stored results on the server side would be the ideal one to use. The next issue relating to output handling is fetching of the result data. With the default cursor class and the default fetching method, we would see the issues as discussed in this ticket. In the code it looks like: cursor = conn.cursor() cursor.execute(q) rows = cursor.fetchall() for row in rows: for col in row: f.write( "%s, " % col) f.write( "\n" ) This would fetch all the results for individual queries at once and attempt to write them to disk concurrently for all running threads. This is another indication of a bottleneck causing the OOM killer to stop the process while returning data (as was observed during the tests, it is during fetching/writing of the large results to disk that the tests failed). Instead, we used the aforementioned new cursor class to make the connection object, and then fetch results in a more manageable fashion with fetchmany() instead of fetchall() : cursor = conn.cursor() cursor.execute(q) while True : rows = cursor.fetchmany( 10000 ) if rows = = (): break for row in rows: f.write( ", " .join([ str (col) for col in row])) f.write( "\n" ) The combination of all these changes and observations lead to the KPM tests being able to run successfully for the 20% DR1 level.
            Hide
            vaikunth Vaikunth Thukral added a comment -

            Summary, notes, and thoughts for future KPM tests

            With all of the points mentioned in this ticket and attempts and fixing them for this iteration of KPMs, we note that the actual problem of handling large query results is not completely fixed. These tests were run at 20% DR1 level, and we are supposed to run the same tests every year for increasing data catalog sizes all the way up to 100% of DR1. With that in mind, we conclude that as the data size scales we must be able to support an equally scalable testing module that can handle increasingly growing output sizes as well as an increasing number of queries/threads in an attempt to emulate real world use-case. So far, two options have been discussed, with more to be considered when work on 2017 KPM's begins in the next cycle.

            1. Rewrite testing module in C++
            The current testing module seems to have obvious limitations due to being written in python, and as such doesn't look like it would be able to support the level of stress we need to exercise on Qserv. A candidate solution is to re-write this entire module in C++ where we can control all memory management aspects of the query threads and their connections, result and metadata objects. Since C++ will give us this handle, the new testing module will be scaleable for further iterations of this test as well as be able to handle our requirements through direct use of the C++ mysql library. This method may be a bit overkill as it will require a decent amount of coding effort.

            2. Split python module into multiple submission nodes
            Another important point to note about these tests is that they are submitted from the same dev node onto the Qserv nodes at CC-IN2P3. Since all queries are run over the same network connection to the mysql proxy and the results are also streamed back over that same network link, this can cause bottlenecks and is also not ideally the best real-world representation of Qserv workload. We do not expect to support the number of queries and results as being currently testing being submitted from the same node/client. Hence, another potential solution could be to split the testing module into submission from multiple nodes so there are multiple network links for the returned results to stream through. This however, is not guaranteed to scale to our requirements for the tests.

            Show
            vaikunth Vaikunth Thukral added a comment - Summary, notes, and thoughts for future KPM tests With all of the points mentioned in this ticket and attempts and fixing them for this iteration of KPMs, we note that the actual problem of handling large query results is not completely fixed. These tests were run at 20% DR1 level, and we are supposed to run the same tests every year for increasing data catalog sizes all the way up to 100% of DR1. With that in mind, we conclude that as the data size scales we must be able to support an equally scalable testing module that can handle increasingly growing output sizes as well as an increasing number of queries/threads in an attempt to emulate real world use-case. So far, two options have been discussed, with more to be considered when work on 2017 KPM's begins in the next cycle. 1. Rewrite testing module in C++ The current testing module seems to have obvious limitations due to being written in python, and as such doesn't look like it would be able to support the level of stress we need to exercise on Qserv. A candidate solution is to re-write this entire module in C++ where we can control all memory management aspects of the query threads and their connections, result and metadata objects. Since C++ will give us this handle, the new testing module will be scaleable for further iterations of this test as well as be able to handle our requirements through direct use of the C++ mysql library. This method may be a bit overkill as it will require a decent amount of coding effort. 2. Split python module into multiple submission nodes Another important point to note about these tests is that they are submitted from the same dev node onto the Qserv nodes at CC-IN2P3. Since all queries are run over the same network connection to the mysql proxy and the results are also streamed back over that same network link, this can cause bottlenecks and is also not ideally the best real-world representation of Qserv workload. We do not expect to support the number of queries and results as being currently testing being submitted from the same node/client. Hence, another potential solution could be to split the testing module into submission from multiple nodes so there are multiple network links for the returned results to stream through. This however, is not guaranteed to scale to our requirements for the tests.
            Hide
            jammes Fabrice Jammes added a comment -

            Hi,

            On my side I'm testing dbbench, https://github.com/memsql/dbbench, in DM-10041, this is memsql benchmarking tool. It's written in go-lang, which is well known for its concurrency support, and seems to work out of th box. It also support a lot of use-cases for concurrent queries, more than I have in https://github.com/lsst/qserv_testscale repository.

            The only bug I've currently found with it is that it doesn't support empty query results and NULL column. I think this should not be difficult to fix.

            Show
            jammes Fabrice Jammes added a comment - Hi, On my side I'm testing dbbench, https://github.com/memsql/dbbench , in DM-10041 , this is memsql benchmarking tool. It's written in go-lang, which is well known for its concurrency support, and seems to work out of th box. It also support a lot of use-cases for concurrent queries, more than I have in https://github.com/lsst/qserv_testscale repository. The only bug I've currently found with it is that it doesn't support empty query results and NULL column. I think this should not be difficult to fix.
            Hide
            fritzm Fritz Mueller added a comment -

            Thanks for the write-up, Vaikunth and Fabrice. This is good for now; we'll create specific work tickets downstream ahead of the F17 scaling tests.

            Show
            fritzm Fritz Mueller added a comment - Thanks for the write-up, Vaikunth and Fabrice. This is good for now; we'll create specific work tickets downstream ahead of the F17 scaling tests.

              People

              Assignee:
              vaikunth Vaikunth Thukral
              Reporter:
              vaikunth Vaikunth Thukral
              Reviewers:
              Fritz Mueller
              Watchers:
              Fabrice Jammes, Fritz Mueller, Vaikunth Thukral
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: