Bunch of plots from Grafana, these are all time series covering the whole test. Most of them are based on metrics produced by Cassandra, some come from ap_proto logs (the same data as on the above plots).
Timing of the queries fro selecting and storing (real/wall clock time):
Counts of the object selected per visit and per CCD. First plot shows DiaSource and DiaObject, for DiaObject there are two times shown - before and after filtering. Before filtering is all records returned from a query which includes everything covered by a spatial partitions enclosing CCD region. After filtering is the records inside CCD region. After 12 months DiaSource counts have plateaued, DisObject continues to rise. Second plot includes DiaForcedSource:
Counter of objects stored in each table (again per CCD), this is stable (fluctuation are low as these are averaged over CCDs and multiple visits):
Next plots are all from Cassandra metrics.
Read latenciy for each individual table. Interesting step-like behavior for source tables. I do not know exactly what this latency includes, docs say it's "local" read latency, this probably means latency of reading data on the node where data is located and does not include result collection by coordinator. Regular drops in DiaObject latency are probably due to compaction.
Write latency, looks stable over long periods:
Total data volume for each separate table:
Data compression efficiency for each table type:
Total SSTable count, one logical table consists of one or more SSTable per node. Counts fluctuate due to compaction. Source counters grow with time because we have per-month tables in this setup.
Read repair rate, colorful picture. What is interesting is that it happens mostly in DiaForcedSource tables. I think this can be explained by concurrent read and write into the same table which is sensitive to timing. ap_proto timing is very tight, there is very little delay between reading and writing, in AP pipeline this will probably be less of an issue. And I don't think this is causing issues with current level of repair rate, I do not see indications on other plots that it affects anything.
System load average, I do not see any outstanding nodes like in the previous tests, all looks more or less uniform:

This is all really great to see. One thing that would be useful to record here is some sense of the data volumes involved. E.g., what's the size of the result set for a visit, in bytes and number of records, and the data volume on disk after (roughly) N visits, etc. That would help in estimating things like the effective read bandwidth and potential storage cost trade-offs.
Regarding the number of visits, I checked a handful of different opsim runs and they seem to expect about 200k-240k visits per year, depending on the actual survey strategy chosen. That should be a pretty solid estimate, and the big uncertainties like weather tend to be downside risks. So I think your visit model is pretty reasonable. The larger uncertainty is the DIASource counts.