Before trying to run multiple cassandra instances per host (with docker) I decided to re-run the test with reduced JVM memory allocation (96GB instead of 160GB, still with G1 GC) and replication factor 2, that means twice as much data per node and twice as much load when storing the data. Some cassandra options were changed too to account for reduced JVM memory. I only ran it for 100k visits but in general this configuration behaved better. I did not see any timeouts at all and GC collection times were more reasonable.
Here are some plots from grafana:
Number of records stored and selected, this numbers are from client side (ap_proto) they are consistent with the above plots:
More interesting grafana plot time to store data per CCD:
The numbers are ~50% higher compared to previous run, this is likely due to replication, still numbers look, significantly lower than read time.
Time to read data (per CCD):
These look consistent with previous run, for reading I user consistency level 1, meaning that response from one replica was requested so it is probably similar to single replica case (but this is not what we should do in production)
And garbage collection time:
There are spikes here too but they are much smaller scale.
and the plots from my notebook with visit numbers for X axis,
Read time for three tables:
There are fewer oultiers here compared to previous case.
Combined fits for read time:
And combined fit for store time:
(peculiar behavior, seems that it improves with the visit number, though I don't think it will turn negative )