# Test APDB Cassandra prototype with native time partitioning

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s: None
• Labels:
• Story Points:
4
• Sprint:
DB_S21_12
• Team:
Data Access and Database
• Urgent?:
No

#### Description

I want to run a quick test (~100k visits) using Cassandra partitioning instead of tables for time. This is just to compare performance, I think we still want separate tables for management reasons. The option is there already in the code, but it was not tested recently.

#### Attachments

1. apdb-gcp6-nb-time-select-fit-100k-1.png
85 kB
2. apdb-gcp7-counter-queries.png
31 kB
3. apdb-gcp7-nb-time-select-cpu.png
48 kB
4. apdb-gcp7-nb-time-select-fit-100k.png
77 kB
5. apdb-gcp7-nb-time-select-iow.png
53 kB
6. apdb-gcp7-nb-time-select-real.png
54 kB
40 kB
8. apdb-gcp7-timing-select-real.png
49 kB
9. apdb-gcp7-timing-store-real.png
75 kB
10. apdb-gcp7-write-latency-server-8.png
60 kB
154 kB
12. apdb-gcp8-nb-time-select-cpu.png
47 kB
13. apdb-gcp8-nb-time-select-real.png
68 kB

#### Activity

Hide
Andy Salnikov added a comment - - edited

It may be faster not to create individual small data frames for each query but to concatenate rows first and then make one single data frame out of it. Complication here is that we need to verify that each query returns the same set of columns and in the same order. We do SELECT * on all tables and I think in that case Cassandra returns data with the columns determined by the schema, though I can imagine some complicated scenarios where it can break.

Show
Andy Salnikov added a comment - - edited It may be faster not to create individual small data frames for each query but to concatenate rows first and then make one single data frame out of it. Complication here is that we need to verify that each query returns the same set of columns and in the same order. We do SELECT * on all tables and I think in that case Cassandra returns data with the columns determined by the schema, though I can imagine some complicated scenarios where it can break.
Hide
Andy Salnikov added a comment -

Another short round of tests finished with a small modification to how the query results are converted into pandas on client side. Instead of converting each query separately and then doing pandas.concat() I concatenate all rows and then convert all of them into pandas (checking that columns are identical for all results).

• total 91242 visits generated
• first 79829 visits with a query per time partition (13 queries per visit per source table)
• remaining with query per both time and spatial partitions (~200 queries per visit per source table)

 andy_salnikov_gmail_com@apdb-server-1:~$nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.128.0.118 456.16 GiB 256 ? cfe4f9d8-7298-4256-92eb-9af257d7d7c7 rack1 UN 10.128.0.91 437.46 GiB 256 ? ad7b9ea8-893a-4f0c-82b4-3c188d09917c rack1 UN 10.128.0.119 473 GiB 256 ? 66b0a0d9-5a72-4c71-b68d-cde4d3f2bf20 rack1 UN 10.128.0.103 470.3 GiB 256 ? c0beaedd-70c7-4c47-a19a-072458b06cc6 rack1 UN 10.128.0.57 475.17 GiB 256 ? ddfed4bd-f7c2-41c4-ac89-f4a5fe85de2b rack1 UN 10.128.0.123 425.7 GiB 256 ? 00f2c39f-a896-4876-afe8-b27af06746b9 rack1 UN 10.128.0.101 468.57 GiB 256 ? 048ac418-b8ee-4323-b292-a0461e0416f0 rack1 UN 10.128.0.121 429.41 GiB 256 ? 9243bcf7-ac6b-4c56-98dc-0788d41d9c6c rack1 UN 10.128.0.122 440.62 GiB 256 ? 78939345-04ee-482c-84ce-ca5cdcf1f102 rack1 UN 10.128.0.100 454.89 GiB 256 ? f62254e7-fe4e-46e0-83fa-c62aef354630 rack1 UN 10.128.0.52 452.9 GiB 256 ? 8fff5795-ae8c-4160-a6b5-63834a40edc9 rack1 UN 10.128.0.37 469.05 GiB 256 ? 5be2e4a2-793f-420a-8b18-cbf7a2f3a369 rack1  Data sizes must be consistent with other test, the most interesting change is in timing (real/CPU) for read requests. Will add plots shortly. Show Andy Salnikov added a comment - Another short round of tests finished with a small modification to how the query results are converted into pandas on client side. Instead of converting each query separately and then doing pandas.concat() I concatenate all rows and then convert all of them into pandas (checking that columns are identical for all results). total 91242 visits generated first 79829 visits with a query per time partition (13 queries per visit per source table) remaining with query per both time and spatial partitions (~200 queries per visit per source table) andy_salnikov_gmail_com@apdb-server-1:~$ nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.128.0.118 456.16 GiB 256 ? cfe4f9d8-7298-4256-92eb-9af257d7d7c7 rack1 UN 10.128.0.91 437.46 GiB 256 ? ad7b9ea8-893a-4f0c-82b4-3c188d09917c rack1 UN 10.128.0.119 473 GiB 256 ? 66b0a0d9-5a72-4c71-b68d-cde4d3f2bf20 rack1 UN 10.128.0.103 470.3 GiB 256 ? c0beaedd-70c7-4c47-a19a-072458b06cc6 rack1 UN 10.128.0.57 475.17 GiB 256 ? ddfed4bd-f7c2-41c4-ac89-f4a5fe85de2b rack1 UN 10.128.0.123 425.7 GiB 256 ? 00f2c39f-a896-4876-afe8-b27af06746b9 rack1 UN 10.128.0.101 468.57 GiB 256 ? 048ac418-b8ee-4323-b292-a0461e0416f0 rack1 UN 10.128.0.121 429.41 GiB 256 ? 9243bcf7-ac6b-4c56-98dc-0788d41d9c6c rack1 UN 10.128.0.122 440.62 GiB 256 ? 78939345-04ee-482c-84ce-ca5cdcf1f102 rack1 UN 10.128.0.100 454.89 GiB 256 ? f62254e7-fe4e-46e0-83fa-c62aef354630 rack1 UN 10.128.0.52 452.9 GiB 256 ? 8fff5795-ae8c-4160-a6b5-63834a40edc9 rack1 UN 10.128.0.37 469.05 GiB 256 ? 5be2e4a2-793f-420a-8b18-cbf7a2f3a369 rack1 Data sizes must be consistent with other test, the most interesting change is in timing (real/CPU) for read requests. Will add plots shortly.
Hide
Andy Salnikov added a comment -

With this changes this is how real time plot looks like:

There is still a jump on DiaSource plot when I switched from 13 queries to 200 queries, but it is much less pronounced. Interesting that there is no observable change in DiaForcedSource select time across that switch.
Select time for DisSource is consistent with previous test.

On the CPU time plot:

the amplitude of that jump for DiaSource is higher than on the real time, but it is also much smaller than CPU time in previous test. Jump in CPU time is also present for DiaForcedSource which shows no such jump in real time.
This feels like there is probably a lot more work happening in Cassandra for asynchronous request management, not clear if that can be reduced or if the reduction can translate into reduced real time.

The conclusion for this exercise is that for large number of queries it is definitely better to avoid intermediate conversion of individual results to pandas, it is more efficient to concatenate results as Python list and convert that to pandas in one go.

Show
Andy Salnikov added a comment - With this changes this is how real time plot looks like: There is still a jump on DiaSource plot when I switched from 13 queries to 200 queries, but it is much less pronounced. Interesting that there is no observable change in DiaForcedSource select time across that switch. Select time for DisSource is consistent with previous test. On the CPU time plot: the amplitude of that jump for DiaSource is higher than on the real time, but it is also much smaller than CPU time in previous test. Jump in CPU time is also present for DiaForcedSource which shows no such jump in real time. This feels like there is probably a lot more work happening in Cassandra for asynchronous request management, not clear if that can be reduced or if the reduction can translate into reduced real time. The conclusion for this exercise is that for large number of queries it is definitely better to avoid intermediate conversion of individual results to pandas, it is more efficient to concatenate results as Python list and convert that to pandas in one go.
Hide
Andy Salnikov added a comment - - edited

Another interesting plot is from Cassandra metrics. This shows request latency (response time) for all read/select queries:

The comb structure is because I do source reading for 10% of the visits. For first 80k visits these clearly take much longer than other (DiaObject) queries, after the switch the comb is inverted and source queries are taking shorter time. This plot is of course for an average over individual queries, we do run many more queries after switch tyhan before. While this plot is interesting it's hard to draw any conclusions from it, it is not entirely clear if the bottleneck is in the processing of results on client side or if it inherently server side effect.

Show
Andy Salnikov added a comment - - edited Another interesting plot is from Cassandra metrics. This shows request latency (response time) for all read/select queries: The comb structure is because I do source reading for 10% of the visits. For first 80k visits these clearly take much longer than other (DiaObject) queries, after the switch the comb is inverted and source queries are taking shorter time. This plot is of course for an average over individual queries, we do run many more queries after switch tyhan before. While this plot is interesting it's hard to draw any conclusions from it, it is not entirely clear if the bottleneck is in the processing of results on client side or if it inherently server side effect.
Hide
Andy Salnikov added a comment -

I think I have collected enough info in these tests, though it is still unclear whether 200 queries can be executed faster than 13 queries. IT looks like there may be a potential if we could reduce client-side CPU but it may need some drastic measures, like implementing part of Cassandra logic in C++/Cython (and I have not looked at Cassandra client code yet).

For the next test I want to check whether BLOB can give us any improvement compared to individual columns for DiaSource at least. I think it could reduce CPU time (and maybe I/O) on server side, but we may need to pay heavier price for that on client side.

I'm going to close this ticket, there were some code changes on my development branch that I self-reviewed as usual.

Show
Andy Salnikov added a comment - I think I have collected enough info in these tests, though it is still unclear whether 200 queries can be executed faster than 13 queries. IT looks like there may be a potential if we could reduce client-side CPU but it may need some drastic measures, like implementing part of Cassandra logic in C++/Cython (and I have not looked at Cassandra client code yet). For the next test I want to check whether BLOB can give us any improvement compared to individual columns for DiaSource at least. I think it could reduce CPU time (and maybe I/O) on server side, but we may need to pay heavier price for that on client side. I'm going to close this ticket, there were some code changes on my development branch that I self-reviewed as usual.

#### People

Assignee:
Andy Salnikov
Reporter:
Andy Salnikov
Watchers:
Andy Salnikov, Colin Slater, Fritz Mueller