Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-8965

Extend Alert Production prototype with new index type

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: L1 Database
    • Labels:
    • Story Points:
      8
    • Sprint:
      DB_S17_01
    • Team:
      Data Access and Database

      Description

      I want to try to improve indexing in the DiaObject table to make both search and insertion faster.

        Attachments

        1. my-obj-select-new-index.png
          my-obj-select-new-index.png
          35 kB
        2. my-obj-select-old-index.png
          my-obj-select-old-index.png
          34 kB
        3. my-obj-update-new-index.png
          my-obj-update-new-index.png
          37 kB
        4. my-obj-update-old-index.png
          my-obj-update-old-index.png
          34 kB
        5. my-select-new-index.png
          my-select-new-index.png
          37 kB
        6. my-update-new-index.png
          my-update-new-index.png
          36 kB
        7. pg-obj-select-new-index.png
          pg-obj-select-new-index.png
          34 kB
        8. pg-obj-select-old-index.png
          pg-obj-select-old-index.png
          31 kB
        9. pg-obj-update-new-index.png
          pg-obj-update-new-index.png
          33 kB
        10. pg-obj-update-old-index.png
          pg-obj-update-old-index.png
          34 kB
        11. pg-select-new-index.png
          pg-select-new-index.png
          35 kB
        12. pg-update-new-index.png
          pg-update-new-index.png
          35 kB

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            The idea is to avoid selecting based on validityEnd=NULL (meaning last version of the Object) and instead store latest versions of Objects in a separate table. If we can avoid using validityEnd altogether (this depends on other queries too, not just AP-related queries) then we do not need to store or update it which will save one UPDATE query (truncating intervals). SELECT on a table which has only latest version of Object should also be faster. This of course implies some duplication of Objects (latest version will be store both in regular Object and separate latest-version-Object table) but it may still provide better performance overall. Update of the latest-version-Object table could be more efficient as it is logically done "in-place", MySQL has a special REPLACE statement for that, though in Postgres we have to implement it via DELETE + INSERT. Also, "in-place" update is not really in-place as the isolation mechanism will need to make sure that transactions do not see each other's data.

            Show
            salnikov Andy Salnikov added a comment - The idea is to avoid selecting based on validityEnd=NULL (meaning last version of the Object) and instead store latest versions of Objects in a separate table. If we can avoid using validityEnd altogether (this depends on other queries too, not just AP-related queries) then we do not need to store or update it which will save one UPDATE query (truncating intervals). SELECT on a table which has only latest version of Object should also be faster. This of course implies some duplication of Objects (latest version will be store both in regular Object and separate latest-version-Object table) but it may still provide better performance overall. Update of the latest-version-Object table could be more efficient as it is logically done "in-place", MySQL has a special REPLACE statement for that, though in Postgres we have to implement it via DELETE + INSERT. Also, "in-place" update is not really in-place as the isolation mechanism will need to make sure that transactions do not see each other's data.
            Hide
            salnikov Andy Salnikov added a comment -

            First set of plots to demonstrate that select from separate last-version-Object table is indeed faster.

            First plot is for the timing of Object select with previous single-table implementation:

            And this is select time from a separate dedicated last-version-Object table:

            Select is now more than twice faster.

            Show
            salnikov Andy Salnikov added a comment - First set of plots to demonstrate that select from separate last-version-Object table is indeed faster. First plot is for the timing of Object select with previous single-table implementation: And this is select time from a separate dedicated last-version-Object table: Select is now more than twice faster.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Object update timing is not so impressive though for Postgres (again old vs new):

            It looks like much of a gain on SELECT is lost when updating two tables. I wonder if there is more efficient way to update last-version-Object in Postgres (sort of "in-place"). Currently timing of the updates to both Object tables is almost identical and I'd like to think that update to shorter last-version-Object table can be faster.

            Show
            salnikov Andy Salnikov added a comment - - edited Object update timing is not so impressive though for Postgres (again old vs new): It looks like much of a gain on SELECT is lost when updating two tables. I wonder if there is more efficient way to update last-version-Object in Postgres (sort of "in-place"). Currently timing of the updates to both Object tables is almost identical and I'd like to think that update to shorter last-version-Object table can be faster.
            Hide
            salnikov Andy Salnikov added a comment -

            Here is MySQL situation for SELECT timing (old and new):

            Good improvement, again twice as fast.

            Show
            salnikov Andy Salnikov added a comment - Here is MySQL situation for SELECT timing (old and new): Good improvement, again twice as fast.
            Hide
            salnikov Andy Salnikov added a comment -

            And here is the situation with Object(s) update timing in mysql

            The update speed actually improves, but this most likely because old timing was done with unoptimized server configuration. What is more important is that MySQL timing for updates is better than Postgres.

            Show
            salnikov Andy Salnikov added a comment - And here is the situation with Object(s) update timing in mysql The update speed actually improves, but this most likely because old timing was done with unoptimized server configuration. What is more important is that MySQL timing for updates is better than Postgres.
            Hide
            salnikov Andy Salnikov added a comment -

            And here is for reference total select and update times (including all tables, not just Object) for Postgres:

            and MySQL:


            It is still not looking too good, especially for update times, given that this simulation covers ~2 weeks of data taking.

            Show
            salnikov Andy Salnikov added a comment - And here is for reference total select and update times (including all tables, not just Object) for Postgres: and MySQL: It is still not looking too good, especially for update times, given that this simulation covers ~2 weeks of data taking.
            Hide
            salnikov Andy Salnikov added a comment -

            Closing the ticket, no review necessary. Will do more testing in next sprint.

            Show
            salnikov Andy Salnikov added a comment - Closing the ticket, no review necessary. Will do more testing in next sprint.
            Hide
            salnikov Andy Salnikov added a comment -

            And finally moved my notebooks to github, here is the notebook used for plots in this ticket: https://github.com/lsst-dm/l1dbproto-notebooks/blob/master/L1%20Prototype%20timing%20-%20DiaObjectLast%20table.ipynb. CSV data file are in the same repo.

            Show
            salnikov Andy Salnikov added a comment - And finally moved my notebooks to github, here is the notebook used for plots in this ticket: https://github.com/lsst-dm/l1dbproto-notebooks/blob/master/L1%20Prototype%20timing%20-%20DiaObjectLast%20table.ipynb . CSV data file are in the same repo.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Watchers:
              Andy Salnikov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.