Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-28522

Test APDB Cassandra prototype with native time partitioning

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Story Points:
      4
    • Sprint:
      DB_S21_12
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      I want to run a quick test (~100k visits) using Cassandra partitioning instead of tables for time. This is just to compare performance, I think we still want separate tables for management reasons. The option is there already in the code, but it was not tested recently.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment - - edited

            It may be faster not to create individual small data frames for each query but to concatenate rows first and then make one single data frame out of it. Complication here is that we need to verify that each query returns the same set of columns and in the same order. We do SELECT * on all tables and I think in that case Cassandra returns data with the columns determined by the schema, though I can imagine some complicated scenarios where it can break.

            Show
            salnikov Andy Salnikov added a comment - - edited It may be faster not to create individual small data frames for each query but to concatenate rows first and then make one single data frame out of it. Complication here is that we need to verify that each query returns the same set of columns and in the same order. We do SELECT * on all tables and I think in that case Cassandra returns data with the columns determined by the schema, though I can imagine some complicated scenarios where it can break.
            Hide
            salnikov Andy Salnikov added a comment -

            Another short round of tests finished with a small modification to how the query results are converted into pandas on client side. Instead of converting each query separately and then doing pandas.concat() I concatenate all rows and then convert all of them into pandas (checking that columns are identical for all results).

            • total 91242 visits generated
            • first 79829 visits with a query per time partition (13 queries per visit per source table)
            • remaining with query per both time and spatial partitions (~200 queries per visit per source table)

            andy_salnikov_gmail_com@apdb-server-1:~$ nodetool status
            Datacenter: datacenter1
            =======================
            Status=Up/Down
            |/ State=Normal/Leaving/Joining/Moving
            --  Address       Load        Tokens  Owns (effective)  Host ID                               Rack
            UN  10.128.0.118  456.16 GiB  256     ?                 cfe4f9d8-7298-4256-92eb-9af257d7d7c7  rack1
            UN  10.128.0.91   437.46 GiB  256     ?                 ad7b9ea8-893a-4f0c-82b4-3c188d09917c  rack1
            UN  10.128.0.119  473 GiB     256     ?                 66b0a0d9-5a72-4c71-b68d-cde4d3f2bf20  rack1
            UN  10.128.0.103  470.3 GiB   256     ?                 c0beaedd-70c7-4c47-a19a-072458b06cc6  rack1
            UN  10.128.0.57   475.17 GiB  256     ?                 ddfed4bd-f7c2-41c4-ac89-f4a5fe85de2b  rack1
            UN  10.128.0.123  425.7 GiB   256     ?                 00f2c39f-a896-4876-afe8-b27af06746b9  rack1
            UN  10.128.0.101  468.57 GiB  256     ?                 048ac418-b8ee-4323-b292-a0461e0416f0  rack1
            UN  10.128.0.121  429.41 GiB  256     ?                 9243bcf7-ac6b-4c56-98dc-0788d41d9c6c  rack1
            UN  10.128.0.122  440.62 GiB  256     ?                 78939345-04ee-482c-84ce-ca5cdcf1f102  rack1
            UN  10.128.0.100  454.89 GiB  256     ?                 f62254e7-fe4e-46e0-83fa-c62aef354630  rack1
            UN  10.128.0.52   452.9 GiB   256     ?                 8fff5795-ae8c-4160-a6b5-63834a40edc9  rack1
            UN  10.128.0.37   469.05 GiB  256     ?                 5be2e4a2-793f-420a-8b18-cbf7a2f3a369  rack1
            

            Data sizes must be consistent with other test, the most interesting change is in timing (real/CPU) for read requests. Will add plots shortly.

            Show
            salnikov Andy Salnikov added a comment - Another short round of tests finished with a small modification to how the query results are converted into pandas on client side. Instead of converting each query separately and then doing pandas.concat() I concatenate all rows and then convert all of them into pandas (checking that columns are identical for all results). total 91242 visits generated first 79829 visits with a query per time partition (13 queries per visit per source table) remaining with query per both time and spatial partitions (~200 queries per visit per source table) andy_salnikov_gmail_com@apdb-server-1:~$ nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.128.0.118 456.16 GiB 256 ? cfe4f9d8-7298-4256-92eb-9af257d7d7c7 rack1 UN 10.128.0.91 437.46 GiB 256 ? ad7b9ea8-893a-4f0c-82b4-3c188d09917c rack1 UN 10.128.0.119 473 GiB 256 ? 66b0a0d9-5a72-4c71-b68d-cde4d3f2bf20 rack1 UN 10.128.0.103 470.3 GiB 256 ? c0beaedd-70c7-4c47-a19a-072458b06cc6 rack1 UN 10.128.0.57 475.17 GiB 256 ? ddfed4bd-f7c2-41c4-ac89-f4a5fe85de2b rack1 UN 10.128.0.123 425.7 GiB 256 ? 00f2c39f-a896-4876-afe8-b27af06746b9 rack1 UN 10.128.0.101 468.57 GiB 256 ? 048ac418-b8ee-4323-b292-a0461e0416f0 rack1 UN 10.128.0.121 429.41 GiB 256 ? 9243bcf7-ac6b-4c56-98dc-0788d41d9c6c rack1 UN 10.128.0.122 440.62 GiB 256 ? 78939345-04ee-482c-84ce-ca5cdcf1f102 rack1 UN 10.128.0.100 454.89 GiB 256 ? f62254e7-fe4e-46e0-83fa-c62aef354630 rack1 UN 10.128.0.52 452.9 GiB 256 ? 8fff5795-ae8c-4160-a6b5-63834a40edc9 rack1 UN 10.128.0.37 469.05 GiB 256 ? 5be2e4a2-793f-420a-8b18-cbf7a2f3a369 rack1 Data sizes must be consistent with other test, the most interesting change is in timing (real/CPU) for read requests. Will add plots shortly.
            Hide
            salnikov Andy Salnikov added a comment -

            With this changes this is how real time plot looks like:

            There is still a jump on DiaSource plot when I switched from 13 queries to 200 queries, but it is much less pronounced. Interesting that there is no observable change in DiaForcedSource select time across that switch.
            Select time for DisSource is consistent with previous test.

            On the CPU time plot:

            the amplitude of that jump for DiaSource is higher than on the real time, but it is also much smaller than CPU time in previous test. Jump in CPU time is also present for DiaForcedSource which shows no such jump in real time.
            This feels like there is probably a lot more work happening in Cassandra for asynchronous request management, not clear if that can be reduced or if the reduction can translate into reduced real time.

            The conclusion for this exercise is that for large number of queries it is definitely better to avoid intermediate conversion of individual results to pandas, it is more efficient to concatenate results as Python list and convert that to pandas in one go.

            Show
            salnikov Andy Salnikov added a comment - With this changes this is how real time plot looks like: There is still a jump on DiaSource plot when I switched from 13 queries to 200 queries, but it is much less pronounced. Interesting that there is no observable change in DiaForcedSource select time across that switch. Select time for DisSource is consistent with previous test. On the CPU time plot: the amplitude of that jump for DiaSource is higher than on the real time, but it is also much smaller than CPU time in previous test. Jump in CPU time is also present for DiaForcedSource which shows no such jump in real time. This feels like there is probably a lot more work happening in Cassandra for asynchronous request management, not clear if that can be reduced or if the reduction can translate into reduced real time. The conclusion for this exercise is that for large number of queries it is definitely better to avoid intermediate conversion of individual results to pandas, it is more efficient to concatenate results as Python list and convert that to pandas in one go.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Another interesting plot is from Cassandra metrics. This shows request latency (response time) for all read/select queries:

            The comb structure is because I do source reading for 10% of the visits. For first 80k visits these clearly take much longer than other (DiaObject) queries, after the switch the comb is inverted and source queries are taking shorter time. This plot is of course for an average over individual queries, we do run many more queries after switch tyhan before. While this plot is interesting it's hard to draw any conclusions from it, it is not entirely clear if the bottleneck is in the processing of results on client side or if it inherently server side effect.

            Show
            salnikov Andy Salnikov added a comment - - edited Another interesting plot is from Cassandra metrics. This shows request latency (response time) for all read/select queries: The comb structure is because I do source reading for 10% of the visits. For first 80k visits these clearly take much longer than other (DiaObject) queries, after the switch the comb is inverted and source queries are taking shorter time. This plot is of course for an average over individual queries, we do run many more queries after switch tyhan before. While this plot is interesting it's hard to draw any conclusions from it, it is not entirely clear if the bottleneck is in the processing of results on client side or if it inherently server side effect.
            Hide
            salnikov Andy Salnikov added a comment -

            I think I have collected enough info in these tests, though it is still unclear whether 200 queries can be executed faster than 13 queries. IT looks like there may be a potential if we could reduce client-side CPU but it may need some drastic measures, like implementing part of Cassandra logic in C++/Cython (and I have not looked at Cassandra client code yet).

            For the next test I want to check whether BLOB can give us any improvement compared to individual columns for DiaSource at least. I think it could reduce CPU time (and maybe I/O) on server side, but we may need to pay heavier price for that on client side.

            I'm going to close this ticket, there were some code changes on my development branch that I self-reviewed as usual.

            Show
            salnikov Andy Salnikov added a comment - I think I have collected enough info in these tests, though it is still unclear whether 200 queries can be executed faster than 13 queries. IT looks like there may be a potential if we could reduce client-side CPU but it may need some drastic measures, like implementing part of Cassandra logic in C++/Cython (and I have not looked at Cassandra client code yet). For the next test I want to check whether BLOB can give us any improvement compared to individual columns for DiaSource at least. I think it could reduce CPU time (and maybe I/O) on server side, but we may need to pay heavier price for that on client side. I'm going to close this ticket, there were some code changes on my development branch that I self-reviewed as usual.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Watchers:
              Andy Salnikov, Colin Slater, Fritz Mueller
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.