Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-9301

Implement parallel processing in L1DB prototype

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: L1 Database
    • Labels:
    • Story Points:
      8
    • Sprint:
      DB_S17_2
    • Team:
      Data Access and Database

      Description

      Next step in understanding alert production performance is to see if paralellising processing can help with reducing database access overhead.

        Attachments

        1. mysql-15x15-nvme.png
          mysql-15x15-nvme.png
          20 kB
        2. mysql-15x15-ssd.png
          mysql-15x15-ssd.png
          21 kB
        3. mysql-5x5-ssd.png
          mysql-5x5-ssd.png
          20 kB
        4. mysql-no-tiles.png
          mysql-no-tiles.png
          21 kB
        5. pg-15x15-nvme.png
          pg-15x15-nvme.png
          22 kB
        6. pg-15x15-ssd.png
          pg-15x15-ssd.png
          21 kB
        7. pg-5x5-ssd.png
          pg-5x5-ssd.png
          21 kB
        8. pg-no-tiles.png
          pg-no-tiles.png
          21 kB

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            Parallel processing was implemented by splitting FOV into a number of square tiles and processing each tile in a separate forked process. All database operations are performed independently in each process so database server needs to scale appropriately if we want to see better performance. I tried two different scales - 5x5 tiles (25 tiles) and 15x15 tiles (~200 tiles because of round FOV), in actual AP I believe we are going to process each CCD independently so it more or less corresponds to 15x15 case.

            For both MySQL and Postgres I had to change their configuration to enable higher number of client connections, as defaults are pretty low. MySQL InnoDB engine still had some sort of trouble with deadlocks on table rows, which I think is not a real thing, likely InnoDB is confused by something. Anyways the errors that I see are of the sort:

            _mysql_exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
             
            ...
             
              File "/sps/lsst/Qserv/salnikov/lsstsw-py3/stack/Linux64/python_mysqlclient/1.3.7.lsst1+1/lib/python/mysqlclient-1.3.7-py3.5-linux-x86_64.egg/MySQLdb/connections.py", line 280, in query
                _mysql.connection.query(self, query)
            sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') [SQL: 'REPLACE INTO "DiaObjectLast" ("diaObjectId","validityStart","validityEnd","lastNonFo
            

            Googling finds a possible explanation for this due to "gap locks" on table index, the way to "fix" this issue is to "disable" gap locks by using "READ COMMITTED" transaction isolation mode. "READ COMMITTED" has other implications but they are not important for our use case.

            Show
            salnikov Andy Salnikov added a comment - Parallel processing was implemented by splitting FOV into a number of square tiles and processing each tile in a separate forked process. All database operations are performed independently in each process so database server needs to scale appropriately if we want to see better performance. I tried two different scales - 5x5 tiles (25 tiles) and 15x15 tiles (~200 tiles because of round FOV), in actual AP I believe we are going to process each CCD independently so it more or less corresponds to 15x15 case. For both MySQL and Postgres I had to change their configuration to enable higher number of client connections, as defaults are pretty low. MySQL InnoDB engine still had some sort of trouble with deadlocks on table rows, which I think is not a real thing, likely InnoDB is confused by something. Anyways the errors that I see are of the sort: _mysql_exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')   ...   File "/sps/lsst/Qserv/salnikov/lsstsw-py3/stack/Linux64/python_mysqlclient/1.3.7.lsst1+1/lib/python/mysqlclient-1.3.7-py3.5-linux-x86_64.egg/MySQLdb/connections.py", line 280, in query _mysql.connection.query(self, query) sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') [SQL: 'REPLACE INTO "DiaObjectLast" ("diaObjectId","validityStart","validityEnd","lastNonFo Googling finds a possible explanation for this due to "gap locks" on table index, the way to "fix" this issue is to "disable" gap locks by using "READ COMMITTED" transaction isolation mode. "READ COMMITTED" has other implications but they are not important for our use case.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Here is a table with plots from several cases that were studied, all plots show time per visit processing as a function of visit number:

              MySQL Postgres
            single-process
            5x5 tiles
            SSD(SATA)
            15x15 tiles
            SSD(SATA)
            15x15 tiles
            SSD(NVMe)

            Summary

            • In a single-process case (studied in previous tickets) MySQL and Postgres performance is very similar (or Postgres is marginally slower).
            • With multiprocess case performance improves for both MySQL and Postgres, and Postgres shows better performance than MySQL.
            • With both databases 15x15 case performs better that 5x5.
            • Postgres 15x15 performance on NVMe is better than on SSD, this is likely because of better concurrency of NVMe controller (in single-process case I did not see much difference between NVMe and SSD)
            • MySQL 15x15 performance on NVMe is slightly better than on SSD
            • for 15x15 case on NVMe Postgres performs twice as fast or better than MySQL.
            Show
            salnikov Andy Salnikov added a comment - - edited Here is a table with plots from several cases that were studied, all plots show time per visit processing as a function of visit number:   MySQL Postgres single-process 5x5 tiles SSD(SATA) 15x15 tiles SSD(SATA) 15x15 tiles SSD(NVMe) Summary In a single-process case (studied in previous tickets) MySQL and Postgres performance is very similar (or Postgres is marginally slower). With multiprocess case performance improves for both MySQL and Postgres, and Postgres shows better performance than MySQL. With both databases 15x15 case performs better that 5x5. Postgres 15x15 performance on NVMe is better than on SSD, this is likely because of better concurrency of NVMe controller (in single-process case I did not see much difference between NVMe and SSD) MySQL 15x15 performance on NVMe is slightly better than on SSD for 15x15 case on NVMe Postgres performs twice as fast or better than MySQL.
            Hide
            salnikov Andy Salnikov added a comment -

            I think this is more or less complete, all lessons are summarized in comments to the plots.

            Closing, no review necessary, the code in both repos is merge into master.

            Show
            salnikov Andy Salnikov added a comment - I think this is more or less complete, all lessons are summarized in comments to the plots. Closing, no review necessary, the code in both repos is merge into master.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Watchers:
              Andy Salnikov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.