Few plots for baseline schema with the above indices.
Note: there is still no reading of Source and ForcedSource tables currently, only Object table is read, so this numbers are far from realistic. I expect that reading Forced/Sources to be slower than Objects.
The data covers visits 0 to ~3800 which corresponds approximately 5 nights of LSST data collection. There are 58M DiaSources and 75M DiaObjects and DiaForcedSources in database after that number of visits.
MySQL - wall-clock (real) time in seconds to read (blue dots) and write (red crosses) data as a function of visit number, starting with empty database. The gaps are due to lost logs (AFS ate it)
Here is the same plots in box-and-whiskers style binned in 100 visits:
Similar plots for PostgreSQL:
Writing time can be split into individual table update time: truncating Objects, inserting new Objects, inserting Sources, and inserting Forced Sources. Here is the break up of average writing times for visits around 3800:
| DiaObject select
| Total Write
| . DiaObject truncate
| . DiaObject insert
| . DiaSource insert
| . DiaForcedSource insert
Few observations from these numbers:
- performance is nowhere close to what we need for production
- watching the system shows >90% busy for I/O and servers are not maxing out CPU.
- MySQL is slower in selecting DiaObjects, this is probably explained by non-optimal data locality (because of primary index on Object table and InnoDB ordering). For postgres data locality seems to be better because data is stored in order of arriving and that order preserves locality.
- truncate time is worse for postgres, this could probably be improved if we use HOT updates
- the numbers are very likely get worse with the increasing number of objects per table