Before I move to reimplement ap_proto withthe ideas above I want to summarize last round of tests so that we can compare things.
Here is the setup:
- all Dia tables are partitioned based on htm8 index
- DiaSource and DiaForcedSource tables are "manually partitioned" on 30-day intervals, there is a separate table (e.g. DiaSource_608) for each interval, when querying we need to send queries for 12-13 most recent tables.
- geometry has 189 CCDs
- I do not make DiaObjects any more for forced photometry on missing DiaSources
- cutoff for forced photometry is 30 days since last observation of DiaSource
- I did not do manual compaction for this test, it takes long time and I did not see any change due to compaction in previous tests
- about 35k visits were generated in this round
- ap_proto assumes average 10 hours of observation time per night with 45 seconds per visit, making 800 visits per night. Exact number mostly does not matter as most results so far are presented as a function of number of visits. Some values do depend on that 800/night number, e.g. forced photometry cutoff is at 2400 visits in ap_proto terms.
As I already mentioned for this run I observe significant number of timeouts, these are most likely due to large number of clients and number of queries due to per-month tables (and very small number of servers). Timeouts appear randomly, and when there are no timeouts things seem to be working reasonably OK.
Here is a bunch of plots from grafana.
Write latency for each table (all latencies are measured on server side):
Interesting here is that average latency for DiaObject is significantly lower than what I saw in previous test (it was ~200msec before).
Read latency for each table:
Interesting observation here is that master02 latency is higher than for two other nodes, and there are also lots of fluctuations on that node.
Timing for inserts (timing is measured on client side):
and for selects:
Select time for DiaSource tables looks worse than in previous tests, it rises to ~4 sec after 30k visits, which does not look good (though some fraction of that time is CPU time on client).
Counters for the number of queries send by each of the 189 clients:
Number of records retrieved from database for each client:
Number of records stored (per client/CCD):
On the last plot the number of forced sources levels off after one month (24k visits).
On these plots things are generally smoothed by grafana, looking at narrower intervals there is a lot more variation in latices for example.