I want to try to improve indexing in the DiaObject table to make both search and insertion faster.
Produce performance figures for initial L1 database design
Performance plots with AP prototype on SSD
Implement parallel processing in L1DB prototype
The idea is to avoid selecting based on validityEnd=NULL (meaning last version of the Object) and instead store latest versions of Objects in a separate table. If we can avoid using validityEnd altogether (this depends on other queries too, not just AP-related queries) then we do not need to store or update it which will save one UPDATE query (truncating intervals). SELECT on a table which has only latest version of Object should also be faster. This of course implies some duplication of Objects (latest version will be store both in regular Object and separate latest-version-Object table) but it may still provide better performance overall. Update of the latest-version-Object table could be more efficient as it is logically done "in-place", MySQL has a special REPLACE statement for that, though in Postgres we have to implement it via DELETE + INSERT. Also, "in-place" update is not really in-place as the isolation mechanism will need to make sure that transactions do not see each other's data.
First set of plots to demonstrate that select from separate last-version-Object table is indeed faster.
First plot is for the timing of Object select with previous single-table implementation:
And this is select time from a separate dedicated last-version-Object table:
Select is now more than twice faster.
Object update timing is not so impressive though for Postgres (again old vs new):
It looks like much of a gain on SELECT is lost when updating two tables. I wonder if there is more efficient way to update last-version-Object in Postgres (sort of "in-place"). Currently timing of the updates to both Object tables is almost identical and I'd like to think that update to shorter last-version-Object table can be faster.
Here is MySQL situation for SELECT timing (old and new):
Good improvement, again twice as fast.
And here is the situation with Object(s) update timing in mysql
The update speed actually improves, but this most likely because old timing was done with unoptimized server configuration. What is more important is that MySQL timing for updates is better than Postgres.
And here is for reference total select and update times (including all tables, not just Object) for Postgres:
It is still not looking too good, especially for update times, given that this simulation covers ~2 weeks of data taking.
Closing the ticket, no review necessary. Will do more testing in next sprint.
And finally moved my notebooks to github, here is the notebook used for plots in this ticket: https://github.com/lsst-dm/l1dbproto-notebooks/blob/master/L1%20Prototype%20timing%20-%20DiaObjectLast%20table.ipynb. CSV data file are in the same repo.