Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-28172

Test APDB Cassandra prototype with 12 server nodes

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Story Points:
      10
    • Sprint:
      DB_S21_12
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      Time to scale Cassandra cluster again, I think 10 is 1 reasonable number. I also want to do 1-year test, for that we expect 24TB of data for 3 replicas, with some overhead 3TB per server node is probably OK. But it may be interesting to look at dynamics after 1 year (if we don't run out of money), so 4TB per node is probably better. I think local SSDs come in multiples of 4, 8*375GiB = 3000GiB, 12*375GiB = 4500GiB. I don't think it's possible to attach more disks without stopping instance which destroys the data, so we need to plan capacity ahead.

        Attachments

        1. apdb-gcp4-counter-select-obj-src.png
          apdb-gcp4-counter-select-obj-src.png
          55 kB
        2. apdb-gcp4-counter-select-obj-src-fsrc.png
          apdb-gcp4-counter-select-obj-src-fsrc.png
          60 kB
        3. apdb-gcp4-counter-stored.png
          apdb-gcp4-counter-stored.png
          40 kB
        4. apdb-gcp4-latency-read-diaobjectlast.png
          apdb-gcp4-latency-read-diaobjectlast.png
          129 kB
        5. apdb-gcp4-latency-read-fsrc.png
          apdb-gcp4-latency-read-fsrc.png
          150 kB
        6. apdb-gcp4-latency-read-src.png
          apdb-gcp4-latency-read-src.png
          98 kB
        7. apdb-gcp4-latency-write-diaobject.png
          apdb-gcp4-latency-write-diaobject.png
          95 kB
        8. apdb-gcp4-latency-write-diaobjectlast.png
          apdb-gcp4-latency-write-diaobjectlast.png
          101 kB
        9. apdb-gcp4-latency-write-fsrc.png
          apdb-gcp4-latency-write-fsrc.png
          90 kB
        10. apdb-gcp4-latency-write-src.png
          apdb-gcp4-latency-write-src.png
          84 kB
        11. apdb-gcp4-nb-time-avg.png
          apdb-gcp4-nb-time-avg.png
          42 kB
        12. apdb-gcp4-nb-time-scatter.png
          apdb-gcp4-nb-time-scatter.png
          81 kB
        13. apdb-gcp4-nb-time-select-cpu.png
          apdb-gcp4-nb-time-select-cpu.png
          56 kB
        14. apdb-gcp4-nb-time-select-fit-100k.png
          apdb-gcp4-nb-time-select-fit-100k.png
          77 kB
        15. apdb-gcp4-nb-time-select-fit-50k.png
          apdb-gcp4-nb-time-select-fit-50k.png
          81 kB
        16. apdb-gcp4-nb-time-select-real.png
          apdb-gcp4-nb-time-select-real.png
          57 kB
        17. apdb-gcp4-nb-time-store-cpu.png
          apdb-gcp4-nb-time-store-cpu.png
          58 kB
        18. apdb-gcp4-nb-time-store-cpu-total.png
          apdb-gcp4-nb-time-store-cpu-total.png
          46 kB
        19. apdb-gcp4-nb-time-store-real.png
          apdb-gcp4-nb-time-store-real.png
          52 kB
        20. apdb-gcp4-read-repair-rate-liny.png
          apdb-gcp4-read-repair-rate-liny.png
          114 kB
        21. apdb-gcp4-read-repair-rate-logy.png
          apdb-gcp4-read-repair-rate-logy.png
          145 kB
        22. apdb-gcp4-system-load.png
          apdb-gcp4-system-load.png
          247 kB
        23. apdb-gcp4-table-compression.png
          apdb-gcp4-table-compression.png
          49 kB
        24. apdb-gcp4-table-count.png
          apdb-gcp4-table-count.png
          59 kB
        25. apdb-gcp4-table-size.png
          apdb-gcp4-table-size.png
          54 kB
        26. apdb-gcp4-timing-select-real.png
          apdb-gcp4-timing-select-real.png
          65 kB
        27. apdb-gcp4-timing-store-real.png
          apdb-gcp4-timing-store-real.png
          146 kB

          Issue Links

            Activity

            No builds found.
            salnikov Andy Salnikov created issue -
            salnikov Andy Salnikov made changes -
            Field Original Value New Value
            Link This issue is triggered by DM-28154 [ DM-28154 ]
            Hide
            salnikov Andy Salnikov added a comment -

            Actually number of SSDs can be 4, 8, 16, or 24;12 is not possible. 10 VMs with 16 disks cost about the same as 12 VMs with 8 disks. I need to think what other factors may be important. Having too many disks is probably not a nice thing for Cassandra. I think I'll probably start with 12 machines and 8 disks per each. That should be enough capacity for 15 months maybe.

            Show
            salnikov Andy Salnikov added a comment - Actually number of SSDs can be 4, 8, 16, or 24;12 is not possible. 10 VMs with 16 disks cost about the same as 12 VMs with 8 disks. I need to think what other factors may be important. Having too many disks is probably not a nice thing for Cassandra. I think I'll probably start with 12 machines and 8 disks per each. That should be enough capacity for 15 months maybe.
            Hide
            salnikov Andy Salnikov added a comment -

            I managed to make 8 instances with 8 SSDs each, trying to make another instance gives an error:

            Quota 'LOCAL_SSD_TOTAL_GB' exceeded. Limit: 25000.0 in region us-central1.
            

            Show
            salnikov Andy Salnikov added a comment - I managed to make 8 instances with 8 SSDs each, trying to make another instance gives an error: Quota 'LOCAL_SSD_TOTAL_GB' exceeded. Limit: 25000.0 in region us-central1.
            Hide
            salnikov Andy Salnikov added a comment -

            After quota has been increased I made 12 server VMs, Cassandra started OK and I'm running my test now. One odd thing that I noticed - JMX monitoring seems to be missing some info on all instance, in particular per-table monitoring info is missing (or actually only exists for ApdbProtoVisits table, but not for any other table). It looks like an issue on Cassandra side this time, need to investigate.

            Show
            salnikov Andy Salnikov added a comment - After quota has been increased I made 12 server VMs, Cassandra started OK and I'm running my test now. One odd thing that I noticed - JMX monitoring seems to be missing some info on all instance, in particular per-table monitoring info is missing (or actually only exists for ApdbProtoVisits table, but not for any other table). It looks like an issue on Cassandra side this time, need to investigate.
            Hide
            salnikov Andy Salnikov added a comment -

            After 50k visits at approximately 2021-01-06 19:10Z I have restarted Cassandra on each node and also JMX monitor. Will see if this fixes monitoring issues.

            Show
            salnikov Andy Salnikov added a comment - After 50k visits at approximately 2021-01-06 19:10Z I have restarted Cassandra on each node and also JMX monitor. Will see if this fixes monitoring issues.
            Hide
            salnikov Andy Salnikov added a comment -

            While I'm watching how things progress slowly, here are couple of ideas to try after this long test:

            • in current setup I do time "partitioning" using separate table for each month (30 days), so I am running 13 queries in parallel to retrieve DiaSources. It looks like the data says now that time is proportional to the volume of data and not the number of tables with the data it may still be worth to try coarser partitioning, e.g. one table per 60 days. Or try time partitioning without separate tables (tables are only needed if we want to push older data to slower/cheaper storage).
            • DiaSource tables are pretty wide but we never query individual columns so it is possible that we have an overhead storing separate columns and it could be avoided if we pack the data on client side into some BLOB format. Cassandra is probably doing similar thing already but we have larger scale on client side so we can parallelize that more efficiently if we do this packing on client. Most of the columns are never updated, I think only exception is diaObjectId/ssObjectId, so we could store all non-mutable columns into a BLOB and store it as a single column. BLOB format needs to be chosen to be portable (so that non-Python clients could parse it) but efficient. JSON is portable but it sucks, some binary format should work better (BSON, CBOR, etc.)
            Show
            salnikov Andy Salnikov added a comment - While I'm watching how things progress slowly, here are couple of ideas to try after this long test: in current setup I do time "partitioning" using separate table for each month (30 days), so I am running 13 queries in parallel to retrieve DiaSources. It looks like the data says now that time is proportional to the volume of data and not the number of tables with the data it may still be worth to try coarser partitioning, e.g. one table per 60 days. Or try time partitioning without separate tables (tables are only needed if we want to push older data to slower/cheaper storage). DiaSource tables are pretty wide but we never query individual columns so it is possible that we have an overhead storing separate columns and it could be avoided if we pack the data on client side into some BLOB format. Cassandra is probably doing similar thing already but we have larger scale on client side so we can parallelize that more efficiently if we do this packing on client. Most of the columns are never updated, I think only exception is diaObjectId/ssObjectId, so we could store all non-mutable columns into a BLOB and store it as a single column. BLOB format needs to be chosen to be portable (so that non-Python clients could parse it) but efficient. JSON is portable but it sucks, some binary format should work better (BSON, CBOR, etc.)
            salnikov Andy Salnikov made changes -
            Summary Test APDB Cassandra prototype with 10 server nodes Test APDB Cassandra prototype with 12 server nodes
            Hide
            salnikov Andy Salnikov added a comment -

            It ran without issues for 450k visits, stopping it an getting summaries.

            Data size on each node:

             apdb-server-1: Filesystem      Size  Used Avail Use% Mounted on
             apdb-server-1: /dev/nvme0n1p1  375G  267G  108G  72% /data/apdb1
             apdb-server-1: /dev/nvme0n2p1  375G  264G  111G  71% /data/apdb2
             apdb-server-1: /dev/nvme0n3p1  375G  266G  110G  71% /data/apdb3
             apdb-server-1: /dev/nvme0n4p1  375G  265G  111G  71% /data/apdb4
             apdb-server-1: /dev/nvme0n5p1  375G  263G  113G  70% /data/apdb5
             apdb-server-1: /dev/nvme0n6p1  375G  265G  111G  71% /data/apdb6
             apdb-server-1: /dev/nvme0n7p1  375G  262G  114G  70% /data/apdb7
             apdb-server-1: /dev/nvme0n8p1  375G  270G  106G  72% /data/apdb8
             apdb-server-2: Filesystem      Size  Used Avail Use% Mounted on
             apdb-server-2: /dev/nvme0n1p1  375G  284G   92G  76% /data/apdb1
             apdb-server-2: /dev/nvme0n2p1  375G  284G   91G  76% /data/apdb2
             apdb-server-2: /dev/nvme0n3p1  375G  283G   93G  76% /data/apdb3
             apdb-server-2: /dev/nvme0n4p1  375G  287G   88G  77% /data/apdb4
             apdb-server-2: /dev/nvme0n5p1  375G  287G   89G  77% /data/apdb5
             apdb-server-2: /dev/nvme0n6p1  375G  279G   97G  75% /data/apdb6
             apdb-server-2: /dev/nvme0n7p1  375G  286G   90G  77% /data/apdb7
             apdb-server-2: /dev/nvme0n8p1  375G  284G   92G  76% /data/apdb8
             apdb-server-4: Filesystem      Size  Used Avail Use% Mounted on
             apdb-server-4: /dev/nvme0n1p1  375G  294G   82G  79% /data/apdb1
             apdb-server-4: /dev/nvme0n2p1  375G  294G   82G  79% /data/apdb2
             apdb-server-4: /dev/nvme0n3p1  375G  294G   81G  79% /data/apdb3
             apdb-server-4: /dev/nvme0n4p1  375G  293G   83G  78% /data/apdb4
             apdb-server-4: /dev/nvme0n5p1  375G  296G   80G  79% /data/apdb5
             apdb-server-4: /dev/nvme0n6p1  375G  294G   81G  79% /data/apdb6
             apdb-server-4: /dev/nvme0n7p1  375G  293G   83G  79% /data/apdb7
             apdb-server-4: /dev/nvme0n8p1  375G  294G   82G  79% /data/apdb8
             apdb-server-9: Filesystem      Size  Used Avail Use% Mounted on
             apdb-server-9: /dev/nvme0n1p1  375G  316G   60G  85% /data/apdb1
             apdb-server-9: /dev/nvme0n2p1  375G  311G   65G  83% /data/apdb2
             apdb-server-9: /dev/nvme0n3p1  375G  311G   65G  83% /data/apdb3
             apdb-server-9: /dev/nvme0n4p1  375G  313G   63G  84% /data/apdb4
             apdb-server-9: /dev/nvme0n5p1  375G  313G   63G  84% /data/apdb5
             apdb-server-9: /dev/nvme0n6p1  375G  312G   64G  84% /data/apdb6
             apdb-server-9: /dev/nvme0n7p1  375G  312G   64G  83% /data/apdb7
             apdb-server-9: /dev/nvme0n8p1  375G  308G   68G  83% /data/apdb8
            apdb-server-10: Filesystem      Size  Used Avail Use% Mounted on
            apdb-server-10: /dev/nvme0n1p1  375G  288G   88G  77% /data/apdb1
            apdb-server-10: /dev/nvme0n2p1  375G  281G   94G  75% /data/apdb2
            apdb-server-10: /dev/nvme0n3p1  375G  282G   94G  76% /data/apdb3
            apdb-server-10: /dev/nvme0n4p1  375G  279G   97G  75% /data/apdb4
            apdb-server-10: /dev/nvme0n5p1  375G  282G   94G  76% /data/apdb5
            apdb-server-10: /dev/nvme0n6p1  375G  282G   93G  76% /data/apdb6
            apdb-server-10: /dev/nvme0n7p1  375G  284G   91G  76% /data/apdb7
            apdb-server-10: /dev/nvme0n8p1  375G  285G   91G  76% /data/apdb8
             apdb-server-3: Filesystem      Size  Used Avail Use% Mounted on
             apdb-server-3: /dev/nvme0n1p1  375G  293G   83G  79% /data/apdb1
             apdb-server-3: /dev/nvme0n2p1  375G  288G   87G  77% /data/apdb2
             apdb-server-3: /dev/nvme0n3p1  375G  287G   89G  77% /data/apdb3
             apdb-server-3: /dev/nvme0n4p1  375G  285G   91G  76% /data/apdb4
             apdb-server-3: /dev/nvme0n5p1  375G  287G   88G  77% /data/apdb5
             apdb-server-3: /dev/nvme0n6p1  375G  288G   88G  77% /data/apdb6
             apdb-server-3: /dev/nvme0n7p1  375G  289G   87G  77% /data/apdb7
             apdb-server-3: /dev/nvme0n8p1  375G  288G   87G  77% /data/apdb8
             apdb-server-8: Filesystem      Size  Used Avail Use% Mounted on
             apdb-server-8: /dev/nvme0n1p1  375G  284G   92G  76% /data/apdb1
             apdb-server-8: /dev/nvme0n2p1  375G  287G   89G  77% /data/apdb2
             apdb-server-8: /dev/nvme0n3p1  375G  288G   88G  77% /data/apdb3
             apdb-server-8: /dev/nvme0n4p1  375G  286G   90G  77% /data/apdb4
             apdb-server-8: /dev/nvme0n5p1  375G  283G   93G  76% /data/apdb5
             apdb-server-8: /dev/nvme0n6p1  375G  287G   89G  77% /data/apdb6
             apdb-server-8: /dev/nvme0n7p1  375G  285G   91G  76% /data/apdb7
             apdb-server-8: /dev/nvme0n8p1  375G  292G   83G  78% /data/apdb8
             apdb-server-5: Filesystem      Size  Used Avail Use% Mounted on
             apdb-server-5: /dev/nvme0n1p1  375G  276G  100G  74% /data/apdb1
             apdb-server-5: /dev/nvme0n2p1  375G  277G   99G  74% /data/apdb2
             apdb-server-5: /dev/nvme0n3p1  375G  278G   98G  74% /data/apdb3
             apdb-server-5: /dev/nvme0n4p1  375G  271G  105G  73% /data/apdb4
             apdb-server-5: /dev/nvme0n5p1  375G  277G   99G  74% /data/apdb5
             apdb-server-5: /dev/nvme0n6p1  375G  268G  108G  72% /data/apdb6
             apdb-server-5: /dev/nvme0n7p1  375G  273G  102G  73% /data/apdb7
             apdb-server-5: /dev/nvme0n8p1  375G  286G   90G  77% /data/apdb8
             apdb-server-7: Filesystem      Size  Used Avail Use% Mounted on
             apdb-server-7: /dev/nvme0n1p1  375G  287G   89G  77% /data/apdb1
             apdb-server-7: /dev/nvme0n2p1  375G  284G   91G  76% /data/apdb2
             apdb-server-7: /dev/nvme0n3p1  375G  285G   91G  76% /data/apdb3
             apdb-server-7: /dev/nvme0n4p1  375G  287G   89G  77% /data/apdb4
             apdb-server-7: /dev/nvme0n5p1  375G  285G   90G  77% /data/apdb5
             apdb-server-7: /dev/nvme0n6p1  375G  287G   89G  77% /data/apdb6
             apdb-server-7: /dev/nvme0n7p1  375G  286G   90G  77% /data/apdb7
             apdb-server-7: /dev/nvme0n8p1  375G  282G   93G  76% /data/apdb8
            apdb-server-11: Filesystem      Size  Used Avail Use% Mounted on
            apdb-server-11: /dev/nvme0n1p1  375G  269G  107G  72% /data/apdb1
            apdb-server-11: /dev/nvme0n2p1  375G  269G  106G  72% /data/apdb2
            apdb-server-11: /dev/nvme0n3p1  375G  266G  110G  71% /data/apdb3
            apdb-server-11: /dev/nvme0n4p1  375G  266G  110G  71% /data/apdb4
            apdb-server-11: /dev/nvme0n5p1  375G  268G  108G  72% /data/apdb5
            apdb-server-11: /dev/nvme0n6p1  375G  268G  107G  72% /data/apdb6
            apdb-server-11: /dev/nvme0n7p1  375G  267G  109G  72% /data/apdb7
            apdb-server-11: /dev/nvme0n8p1  375G  268G  107G  72% /data/apdb8
             apdb-server-6: Filesystem      Size  Used Avail Use% Mounted on
             apdb-server-6: /dev/nvme0n1p1  375G  270G  106G  72% /data/apdb1
             apdb-server-6: /dev/nvme0n2p1  375G  269G  107G  72% /data/apdb2
             apdb-server-6: /dev/nvme0n3p1  375G  271G  105G  73% /data/apdb3
             apdb-server-6: /dev/nvme0n4p1  375G  264G  111G  71% /data/apdb4
             apdb-server-6: /dev/nvme0n5p1  375G  269G  106G  72% /data/apdb5
             apdb-server-6: /dev/nvme0n6p1  375G  264G  112G  71% /data/apdb6
             apdb-server-6: /dev/nvme0n7p1  375G  271G  105G  73% /data/apdb7
             apdb-server-6: /dev/nvme0n8p1  375G  276G  100G  74% /data/apdb8
            apdb-server-12: Filesystem      Size  Used Avail Use% Mounted on
            apdb-server-12: /dev/nvme0n1p1  375G  289G   87G  77% /data/apdb1
            apdb-server-12: /dev/nvme0n2p1  375G  286G   90G  77% /data/apdb2
            apdb-server-12: /dev/nvme0n3p1  375G  289G   87G  77% /data/apdb3
            apdb-server-12: /dev/nvme0n4p1  375G  294G   82G  79% /data/apdb4
            apdb-server-12: /dev/nvme0n5p1  375G  284G   92G  76% /data/apdb5
            apdb-server-12: /dev/nvme0n6p1  375G  285G   91G  76% /data/apdb6
            apdb-server-12: /dev/nvme0n7p1  375G  287G   89G  77% /data/apdb7
            apdb-server-12: /dev/nvme0n8p1  375G  283G   93G  76% /data/apdb8
            

            Total used size is 28452929528 kiB or 26.5 TiB.

            $ nodetool status
            Datacenter: datacenter1
            =======================
            Status=Up/Down
            |/ State=Normal/Leaving/Joining/Moving
            --  Address       Load      Tokens  Owns (effective)  Host ID                               Rack
            UN  10.128.0.91   2.29 TiB  256     ?                 fa91ae2e-2cc4-434a-97e1-b6ae7c6c2137  rack1
            UN  10.128.0.118  2.22 TiB  256     ?                 91128571-d044-4502-b5e4-d463261c70e6  rack1
            UN  10.128.0.119  2.23 TiB  256     ?                 e5ea474a-65e4-4c14-bdbe-b45ab28209c2  rack1
            UN  10.128.0.103  2.43 TiB  256     ?                 9cc42ef8-f13c-4023-9385-62104c9737a1  rack1
            UN  10.128.0.57   2.24 TiB  256     ?                 8c1c51c2-bd81-494f-aae2-726c3b8d335d  rack1
            UN  10.128.0.101  2.1 TiB   256     ?                 bd368072-f7cf-49e7-af49-4ddf5ec52c57  rack1
            UN  10.128.0.123  2.23 TiB  256     ?                 06b1e44f-ecee-49cb-bad3-02a7825e92cb  rack1
            UN  10.128.0.121  2.2 TiB   256     ?                 fd955a68-4962-4543-8126-6d6ac6d5883f  rack1
            UN  10.128.0.100  2.15 TiB  256     ?                 a3206eaf-54e1-4ba1-aa28-66ee55c61a66  rack1
            UN  10.128.0.122  2.08 TiB  256     ?                 7c9a6d53-f926-4caf-8da7-6148f0d22e25  rack1
            UN  10.128.0.52   2.21 TiB  256     ?                 07f25954-4374-401a-a9fd-4227e12cdb3a  rack1
            UN  10.128.0.37   2.06 TiB  256     ?                 fe45f1b3-dc63-4570-9e30-b36f5cd9b5fb  rack1
            

            Total Load size (26.44 TiB) is consistent with df.

            Show
            salnikov Andy Salnikov added a comment - It ran without issues for 450k visits, stopping it an getting summaries. Data size on each node: apdb-server-1: Filesystem Size Used Avail Use% Mounted on apdb-server-1: /dev/nvme0n1p1 375G 267G 108G 72% /data/apdb1 apdb-server-1: /dev/nvme0n2p1 375G 264G 111G 71% /data/apdb2 apdb-server-1: /dev/nvme0n3p1 375G 266G 110G 71% /data/apdb3 apdb-server-1: /dev/nvme0n4p1 375G 265G 111G 71% /data/apdb4 apdb-server-1: /dev/nvme0n5p1 375G 263G 113G 70% /data/apdb5 apdb-server-1: /dev/nvme0n6p1 375G 265G 111G 71% /data/apdb6 apdb-server-1: /dev/nvme0n7p1 375G 262G 114G 70% /data/apdb7 apdb-server-1: /dev/nvme0n8p1 375G 270G 106G 72% /data/apdb8 apdb-server-2: Filesystem Size Used Avail Use% Mounted on apdb-server-2: /dev/nvme0n1p1 375G 284G 92G 76% /data/apdb1 apdb-server-2: /dev/nvme0n2p1 375G 284G 91G 76% /data/apdb2 apdb-server-2: /dev/nvme0n3p1 375G 283G 93G 76% /data/apdb3 apdb-server-2: /dev/nvme0n4p1 375G 287G 88G 77% /data/apdb4 apdb-server-2: /dev/nvme0n5p1 375G 287G 89G 77% /data/apdb5 apdb-server-2: /dev/nvme0n6p1 375G 279G 97G 75% /data/apdb6 apdb-server-2: /dev/nvme0n7p1 375G 286G 90G 77% /data/apdb7 apdb-server-2: /dev/nvme0n8p1 375G 284G 92G 76% /data/apdb8 apdb-server-4: Filesystem Size Used Avail Use% Mounted on apdb-server-4: /dev/nvme0n1p1 375G 294G 82G 79% /data/apdb1 apdb-server-4: /dev/nvme0n2p1 375G 294G 82G 79% /data/apdb2 apdb-server-4: /dev/nvme0n3p1 375G 294G 81G 79% /data/apdb3 apdb-server-4: /dev/nvme0n4p1 375G 293G 83G 78% /data/apdb4 apdb-server-4: /dev/nvme0n5p1 375G 296G 80G 79% /data/apdb5 apdb-server-4: /dev/nvme0n6p1 375G 294G 81G 79% /data/apdb6 apdb-server-4: /dev/nvme0n7p1 375G 293G 83G 79% /data/apdb7 apdb-server-4: /dev/nvme0n8p1 375G 294G 82G 79% /data/apdb8 apdb-server-9: Filesystem Size Used Avail Use% Mounted on apdb-server-9: /dev/nvme0n1p1 375G 316G 60G 85% /data/apdb1 apdb-server-9: /dev/nvme0n2p1 375G 311G 65G 83% /data/apdb2 apdb-server-9: /dev/nvme0n3p1 375G 311G 65G 83% /data/apdb3 apdb-server-9: /dev/nvme0n4p1 375G 313G 63G 84% /data/apdb4 apdb-server-9: /dev/nvme0n5p1 375G 313G 63G 84% /data/apdb5 apdb-server-9: /dev/nvme0n6p1 375G 312G 64G 84% /data/apdb6 apdb-server-9: /dev/nvme0n7p1 375G 312G 64G 83% /data/apdb7 apdb-server-9: /dev/nvme0n8p1 375G 308G 68G 83% /data/apdb8 apdb-server-10: Filesystem Size Used Avail Use% Mounted on apdb-server-10: /dev/nvme0n1p1 375G 288G 88G 77% /data/apdb1 apdb-server-10: /dev/nvme0n2p1 375G 281G 94G 75% /data/apdb2 apdb-server-10: /dev/nvme0n3p1 375G 282G 94G 76% /data/apdb3 apdb-server-10: /dev/nvme0n4p1 375G 279G 97G 75% /data/apdb4 apdb-server-10: /dev/nvme0n5p1 375G 282G 94G 76% /data/apdb5 apdb-server-10: /dev/nvme0n6p1 375G 282G 93G 76% /data/apdb6 apdb-server-10: /dev/nvme0n7p1 375G 284G 91G 76% /data/apdb7 apdb-server-10: /dev/nvme0n8p1 375G 285G 91G 76% /data/apdb8 apdb-server-3: Filesystem Size Used Avail Use% Mounted on apdb-server-3: /dev/nvme0n1p1 375G 293G 83G 79% /data/apdb1 apdb-server-3: /dev/nvme0n2p1 375G 288G 87G 77% /data/apdb2 apdb-server-3: /dev/nvme0n3p1 375G 287G 89G 77% /data/apdb3 apdb-server-3: /dev/nvme0n4p1 375G 285G 91G 76% /data/apdb4 apdb-server-3: /dev/nvme0n5p1 375G 287G 88G 77% /data/apdb5 apdb-server-3: /dev/nvme0n6p1 375G 288G 88G 77% /data/apdb6 apdb-server-3: /dev/nvme0n7p1 375G 289G 87G 77% /data/apdb7 apdb-server-3: /dev/nvme0n8p1 375G 288G 87G 77% /data/apdb8 apdb-server-8: Filesystem Size Used Avail Use% Mounted on apdb-server-8: /dev/nvme0n1p1 375G 284G 92G 76% /data/apdb1 apdb-server-8: /dev/nvme0n2p1 375G 287G 89G 77% /data/apdb2 apdb-server-8: /dev/nvme0n3p1 375G 288G 88G 77% /data/apdb3 apdb-server-8: /dev/nvme0n4p1 375G 286G 90G 77% /data/apdb4 apdb-server-8: /dev/nvme0n5p1 375G 283G 93G 76% /data/apdb5 apdb-server-8: /dev/nvme0n6p1 375G 287G 89G 77% /data/apdb6 apdb-server-8: /dev/nvme0n7p1 375G 285G 91G 76% /data/apdb7 apdb-server-8: /dev/nvme0n8p1 375G 292G 83G 78% /data/apdb8 apdb-server-5: Filesystem Size Used Avail Use% Mounted on apdb-server-5: /dev/nvme0n1p1 375G 276G 100G 74% /data/apdb1 apdb-server-5: /dev/nvme0n2p1 375G 277G 99G 74% /data/apdb2 apdb-server-5: /dev/nvme0n3p1 375G 278G 98G 74% /data/apdb3 apdb-server-5: /dev/nvme0n4p1 375G 271G 105G 73% /data/apdb4 apdb-server-5: /dev/nvme0n5p1 375G 277G 99G 74% /data/apdb5 apdb-server-5: /dev/nvme0n6p1 375G 268G 108G 72% /data/apdb6 apdb-server-5: /dev/nvme0n7p1 375G 273G 102G 73% /data/apdb7 apdb-server-5: /dev/nvme0n8p1 375G 286G 90G 77% /data/apdb8 apdb-server-7: Filesystem Size Used Avail Use% Mounted on apdb-server-7: /dev/nvme0n1p1 375G 287G 89G 77% /data/apdb1 apdb-server-7: /dev/nvme0n2p1 375G 284G 91G 76% /data/apdb2 apdb-server-7: /dev/nvme0n3p1 375G 285G 91G 76% /data/apdb3 apdb-server-7: /dev/nvme0n4p1 375G 287G 89G 77% /data/apdb4 apdb-server-7: /dev/nvme0n5p1 375G 285G 90G 77% /data/apdb5 apdb-server-7: /dev/nvme0n6p1 375G 287G 89G 77% /data/apdb6 apdb-server-7: /dev/nvme0n7p1 375G 286G 90G 77% /data/apdb7 apdb-server-7: /dev/nvme0n8p1 375G 282G 93G 76% /data/apdb8 apdb-server-11: Filesystem Size Used Avail Use% Mounted on apdb-server-11: /dev/nvme0n1p1 375G 269G 107G 72% /data/apdb1 apdb-server-11: /dev/nvme0n2p1 375G 269G 106G 72% /data/apdb2 apdb-server-11: /dev/nvme0n3p1 375G 266G 110G 71% /data/apdb3 apdb-server-11: /dev/nvme0n4p1 375G 266G 110G 71% /data/apdb4 apdb-server-11: /dev/nvme0n5p1 375G 268G 108G 72% /data/apdb5 apdb-server-11: /dev/nvme0n6p1 375G 268G 107G 72% /data/apdb6 apdb-server-11: /dev/nvme0n7p1 375G 267G 109G 72% /data/apdb7 apdb-server-11: /dev/nvme0n8p1 375G 268G 107G 72% /data/apdb8 apdb-server-6: Filesystem Size Used Avail Use% Mounted on apdb-server-6: /dev/nvme0n1p1 375G 270G 106G 72% /data/apdb1 apdb-server-6: /dev/nvme0n2p1 375G 269G 107G 72% /data/apdb2 apdb-server-6: /dev/nvme0n3p1 375G 271G 105G 73% /data/apdb3 apdb-server-6: /dev/nvme0n4p1 375G 264G 111G 71% /data/apdb4 apdb-server-6: /dev/nvme0n5p1 375G 269G 106G 72% /data/apdb5 apdb-server-6: /dev/nvme0n6p1 375G 264G 112G 71% /data/apdb6 apdb-server-6: /dev/nvme0n7p1 375G 271G 105G 73% /data/apdb7 apdb-server-6: /dev/nvme0n8p1 375G 276G 100G 74% /data/apdb8 apdb-server-12: Filesystem Size Used Avail Use% Mounted on apdb-server-12: /dev/nvme0n1p1 375G 289G 87G 77% /data/apdb1 apdb-server-12: /dev/nvme0n2p1 375G 286G 90G 77% /data/apdb2 apdb-server-12: /dev/nvme0n3p1 375G 289G 87G 77% /data/apdb3 apdb-server-12: /dev/nvme0n4p1 375G 294G 82G 79% /data/apdb4 apdb-server-12: /dev/nvme0n5p1 375G 284G 92G 76% /data/apdb5 apdb-server-12: /dev/nvme0n6p1 375G 285G 91G 76% /data/apdb6 apdb-server-12: /dev/nvme0n7p1 375G 287G 89G 77% /data/apdb7 apdb-server-12: /dev/nvme0n8p1 375G 283G 93G 76% /data/apdb8 Total used size is 28452929528 kiB or 26.5 TiB. $ nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.128.0.91 2.29 TiB 256 ? fa91ae2e-2cc4-434a-97e1-b6ae7c6c2137 rack1 UN 10.128.0.118 2.22 TiB 256 ? 91128571-d044-4502-b5e4-d463261c70e6 rack1 UN 10.128.0.119 2.23 TiB 256 ? e5ea474a-65e4-4c14-bdbe-b45ab28209c2 rack1 UN 10.128.0.103 2.43 TiB 256 ? 9cc42ef8-f13c-4023-9385-62104c9737a1 rack1 UN 10.128.0.57 2.24 TiB 256 ? 8c1c51c2-bd81-494f-aae2-726c3b8d335d rack1 UN 10.128.0.101 2.1 TiB 256 ? bd368072-f7cf-49e7-af49-4ddf5ec52c57 rack1 UN 10.128.0.123 2.23 TiB 256 ? 06b1e44f-ecee-49cb-bad3-02a7825e92cb rack1 UN 10.128.0.121 2.2 TiB 256 ? fd955a68-4962-4543-8126-6d6ac6d5883f rack1 UN 10.128.0.100 2.15 TiB 256 ? a3206eaf-54e1-4ba1-aa28-66ee55c61a66 rack1 UN 10.128.0.122 2.08 TiB 256 ? 7c9a6d53-f926-4caf-8da7-6148f0d22e25 rack1 UN 10.128.0.52 2.21 TiB 256 ? 07f25954-4374-401a-a9fd-4227e12cdb3a rack1 UN 10.128.0.37 2.06 TiB 256 ? fe45f1b3-dc63-4570-9e30-b36f5cd9b5fb rack1 Total Load size (26.44 TiB) is consistent with df .
            salnikov Andy Salnikov made changes -
            Attachment apdb-gcp4-nb-time-select-cpu.png [ 47304 ]
            Attachment apdb-gcp4-nb-time-select-real.png [ 47305 ]
            Attachment apdb-gcp4-nb-time-avg.png [ 47306 ]
            Attachment apdb-gcp4-nb-time-scatter.png [ 47307 ]
            Attachment apdb-gcp4-nb-time-select-fit-50k.png [ 47308 ]
            Attachment apdb-gcp4-nb-time-store-cpu.png [ 47309 ]
            Attachment apdb-gcp4-nb-time-store-real.png [ 47310 ]
            salnikov Andy Salnikov made changes -
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Summary of the test:

            • 12 server nodes, 32 cores, 64GB RAM each
            • 8x375GiB local SSD storage on each node
            • replication factor 3, QUORUM consistency for reads and writes
            • 6 client machines, 32 cores each
            • 450k visits generated
            • no errors observed on client side

            The purpose of the test was to understand what happens after we reach 12 months worth of visits, I expect read time to grow ~linearly and then plateau at stable level after 12 months. This is based on what we saw in earlier tests where time grows linearly with the number of visits for source and forced source tables. In my setup I generate 800 per night (average 10 hours with 45 seconds per visit) which is probably an underestimate, I think 900-1000 visits per night may be closer to reality. So in my case 12 months correspond to 288k visits, and 13 months is 312k visit (as we always read 13 months the actual plateau depends where we start writing data compared to time partition boundaries).

            This first plot is for first 100k visits, this is just to compare performance with DM-28154:

             
            The growth looks linear again like in previous tests. Here is the comparison with previous 6-node test (DM-28154) for select timing at 100k visits:

              6 nodes 12 nodes
            obj_select_real at 100k visits, sec 0.23 0.12
            src_select_real at 100k visits, sec 1.96 1.13
            fsrc_select_real at 100k visits, sec 1.07 0.40

            There is a significant improvement, not exactly twice in case of DIASource, but closer to it for other two tables.
             
            More plots to follow...

            Show
            salnikov Andy Salnikov added a comment - - edited Summary of the test: 12 server nodes, 32 cores, 64GB RAM each 8x375GiB local SSD storage on each node replication factor 3, QUORUM consistency for reads and writes 6 client machines, 32 cores each 450k visits generated no errors observed on client side The purpose of the test was to understand what happens after we reach 12 months worth of visits, I expect read time to grow ~linearly and then plateau at stable level after 12 months. This is based on what we saw in earlier tests where time grows linearly with the number of visits for source and forced source tables. In my setup I generate 800 per night (average 10 hours with 45 seconds per visit) which is probably an underestimate, I think 900-1000 visits per night may be closer to reality. So in my case 12 months correspond to 288k visits, and 13 months is 312k visit (as we always read 13 months the actual plateau depends where we start writing data compared to time partition boundaries). This first plot is for first 100k visits, this is just to compare performance with DM-28154 :   The growth looks linear again like in previous tests. Here is the comparison with previous 6-node test ( DM-28154 ) for select timing at 100k visits:   6 nodes 12 nodes obj_select_real at 100k visits, sec 0.23 0.12 src_select_real at 100k visits, sec 1.96 1.13 fsrc_select_real at 100k visits, sec 1.07 0.40 There is a significant improvement, not exactly twice in case of DIASource, but closer to it for other two tables.   More plots to follow...
            Hide
            salnikov Andy Salnikov added a comment -

            Here is the scatter plot for select ans store times. Select time is the sum of real time for database operations (including client time for converting the data into pandas format). Store time is dominated by client CPU time doing conversions from pandas into queries.

            This is the same plot but plotting averages across 5k visits:

            Clearly read time plateaued after ~300k visits. Variations in the plateau are due to aliasing with 1-month partitioning (we always read complete partitions). Store time stays low though it seems to grow slowly.

            Show
            salnikov Andy Salnikov added a comment - Here is the scatter plot for select ans store times. Select time is the sum of real time for database operations (including client time for converting the data into pandas format). Store time is dominated by client CPU time doing conversions from pandas into queries. This is the same plot but plotting averages across 5k visits: Clearly read time plateaued after ~300k visits. Variations in the plateau are due to aliasing with 1-month partitioning (we always read complete partitions). Store time stays low though it seems to grow slowly.
            Hide
            salnikov Andy Salnikov added a comment -

            Detailed per-table select time plots.

            Here is the real time for selects for individual tables (total time is just the sum of three other values):

            There is a visible deviation from linearity for both source tables, though total still looks reasonably linear for the first 12 months. Object select time seems to continue growing slowly even after 12 months.

            Here is CPU time on client side, which also contributes to above numbers:

            CPU time is still significant contribution (I think this explains why there is no factor 2 improvement for DIASource table compared to 6-node cluster). Growth in DIAObject real time is probably explained by growth of CPU time.

             

            Show
            salnikov Andy Salnikov added a comment - Detailed per-table select time plots. Here is the real time for selects for individual tables (total time is just the sum of three other values): There is a visible deviation from linearity for both source tables, though total still looks reasonably linear for the first 12 months. Object select time seems to continue growing slowly even after 12 months. Here is CPU time on client side, which also contributes to above numbers: CPU time is still significant contribution (I think this explains why there is no factor 2 improvement for DIASource table compared to 6-node cluster). Growth in DIAObject real time is probably explained by growth of CPU time.  
            salnikov Andy Salnikov made changes -
            Hide
            salnikov Andy Salnikov added a comment -

            And plots for store times for individual tables:

            This is real time, per-table time only measures time needed to run the insert queries, not to convert the data from pandas into INSERTs. Total time OTOH includes client time for conversion:

            Here is per-table CPU time, this again only covers code that sends queries to Cassandra:

             
            And here is the total CPU time (corresponding to total green markers on real time plot:

            Show
            salnikov Andy Salnikov added a comment - And plots for store times for individual tables: This is real time, per-table time only measures time needed to run the insert queries, not to convert the data from pandas into INSERTs. Total time OTOH includes client time for conversion: Here is per-table CPU time, this again only covers code that sends queries to Cassandra:   And here is the total CPU time (corresponding to total green markers on real time plot:
            Hide
            ctslater Colin Slater added a comment -

            This is all really great to see. One thing that would be useful to record here is some sense of the data volumes involved. E.g., what's the size of the result set for a visit, in bytes and number of records, and the data volume on disk after (roughly) N visits, etc. That would help in estimating things like the effective read bandwidth and potential storage cost trade-offs.

            Regarding the number of visits, I checked a handful of different opsim runs and they seem to expect about 200k-240k visits per year, depending on the actual survey strategy chosen. That should be a pretty solid estimate, and the big uncertainties like weather tend to be downside risks. So I think your visit model is pretty reasonable. The larger uncertainty is the DIASource counts.

            Show
            ctslater Colin Slater added a comment - This is all really great to see. One thing that would be useful to record here is some sense of the data volumes involved. E.g., what's the size of the result set for a visit, in bytes and number of records, and the data volume on disk after (roughly) N visits, etc. That would help in estimating things like the effective read bandwidth and potential storage cost trade-offs. Regarding the number of visits, I checked a handful of different opsim runs and they seem to expect about 200k-240k visits per year, depending on the actual survey strategy chosen. That should be a pretty solid estimate, and the big uncertainties like weather tend to be downside risks. So I think your visit model is pretty reasonable. The larger uncertainty is the DIASource counts.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Colin, I'm going to add more plots from Grafana soon, there are some interesting info there. For data volume, I think we are consistent with an earlier estimate from DMTN-156 which is 2TiB per 100k visits per replica (estimated 27TiB for 450k visits and actual is 26.5TB from above numbers). For record size - ap_proto is based on DPDD schema, so it should be easy to guess what is size on client side, but representation used for storage or transfer will add some overhead which can be significant. I'll try to see how to estimate per-table storage from monitoring data (or add it to future tests).

             

            Show
            salnikov Andy Salnikov added a comment - - edited Colin, I'm going to add more plots from Grafana soon, there are some interesting info there. For data volume, I think we are consistent with an earlier estimate from DMTN-156 which is 2TiB per 100k visits per replica (estimated 27TiB for 450k visits and actual is 26.5TB from above numbers). For record size - ap_proto is based on DPDD schema, so it should be easy to guess what is size on client side, but representation used for storage or transfer will add some overhead which can be significant. I'll try to see how to estimate per-table storage from monitoring data (or add it to future tests).  
            salnikov Andy Salnikov made changes -
            Attachment apdb-gcp4-latency-write-fsrc.png [ 47343 ]
            Attachment apdb-gcp4-latency-write-src.png [ 47344 ]
            Attachment apdb-gcp4-latency-write-diaobjectlast.png [ 47345 ]
            Attachment apdb-gcp4-latency-write-diaobject.png [ 47346 ]
            Attachment apdb-gcp4-latency-read-fsrc.png [ 47347 ]
            Attachment apdb-gcp4-latency-read-src.png [ 47348 ]
            Attachment apdb-gcp4-latency-read-diaobjectlast.png [ 47349 ]
            Attachment apdb-gcp4-counter-select-obj-src-fsrc.png [ 47350 ]
            Attachment apdb-gcp4-counter-select-obj-src.png [ 47351 ]
            Attachment apdb-gcp4-timing-store-real.png [ 47352 ]
            Attachment apdb-gcp4-timing-select-real.png [ 47353 ]
            Attachment apdb-gcp4-table-count.png [ 47354 ]
            Attachment apdb-gcp4-table-size.png [ 47355 ]
            Attachment apdb-gcp4-counter-stored.png [ 47356 ]
            Attachment apdb-gcp4-read-repair-rate-liny.png [ 47357 ]
            Attachment apdb-gcp4-read-repair-rate-logy.png [ 47358 ]
            Attachment apdb-gcp4-system-load.png [ 47359 ]
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Bunch of plots from Grafana, these are all time series covering the whole test. Most of them are based on metrics produced by Cassandra, some come from ap_proto logs (the same data as on the above plots).

            Timing of the queries fro selecting and storing (real/wall clock time):

            Counts of the object selected per visit and per CCD. First plot shows DiaSource and DiaObject, for DiaObject there are two times shown - before and after filtering. Before filtering is all records returned from a query which includes everything covered by a spatial partitions enclosing CCD region. After filtering is the records inside CCD region. After 12 months DiaSource counts have plateaued, DisObject continues to rise. Second plot includes DiaForcedSource:

            Counter of objects stored in each table (again per CCD), this is stable (fluctuation are low as these are averaged over CCDs and multiple visits):

            Next plots are all from Cassandra metrics.

            Read latenciy for each individual table. Interesting step-like behavior for source tables. I do not know exactly what this latency includes, docs say it's "local" read latency, this probably means latency of reading data on the node where data is located and does not include result collection by coordinator. Regular drops in DiaObject latency are probably due to compaction.


            Write latency, looks stable over long periods:



            Total data volume for each separate table:

            Data compression efficiency for each table type:

            Total SSTable count, one logical table consists of one or more SSTable per node. Counts fluctuate due to compaction. Source counters grow with time because we have per-month tables in this setup.

            Read repair rate, colorful picture. What is interesting is that it happens mostly in DiaForcedSource tables. I think this can be explained by concurrent read and write into the same table which is sensitive to timing. ap_proto timing is very tight, there is very little delay between reading and writing, in AP pipeline this will probably be less of an issue. And I don't think this is causing issues with current level of repair rate, I do not see indications on other plots that it affects anything.

            System load average, I do not see any outstanding nodes like in the previous tests, all looks more or less uniform:

            Show
            salnikov Andy Salnikov added a comment - - edited Bunch of plots from Grafana, these are all time series covering the whole test. Most of them are based on metrics produced by Cassandra, some come from ap_proto logs (the same data as on the above plots). Timing of the queries fro selecting and storing (real/wall clock time): Counts of the object selected per visit and per CCD. First plot shows DiaSource and DiaObject, for DiaObject there are two times shown - before and after filtering. Before filtering is all records returned from a query which includes everything covered by a spatial partitions enclosing CCD region. After filtering is the records inside CCD region. After 12 months DiaSource counts have plateaued, DisObject continues to rise. Second plot includes DiaForcedSource: Counter of objects stored in each table (again per CCD), this is stable (fluctuation are low as these are averaged over CCDs and multiple visits): Next plots are all from Cassandra metrics. Read latenciy for each individual table. Interesting step-like behavior for source tables. I do not know exactly what this latency includes, docs say it's "local" read latency, this probably means latency of reading data on the node where data is located and does not include result collection by coordinator. Regular drops in DiaObject latency are probably due to compaction. Write latency, looks stable over long periods: Total data volume for each separate table: Data compression efficiency for each table type: Total SSTable count, one logical table consists of one or more SSTable per node. Counts fluctuate due to compaction. Source counters grow with time because we have per-month tables in this setup. Read repair rate, colorful picture. What is interesting is that it happens mostly in DiaForcedSource tables. I think this can be explained by concurrent read and write into the same table which is sensitive to timing. ap_proto timing is very tight, there is very little delay between reading and writing, in AP pipeline this will probably be less of an issue. And I don't think this is causing issues with current level of repair rate, I do not see indications on other plots that it affects anything. System load average, I do not see any outstanding nodes like in the previous tests, all looks more or less uniform:
            salnikov Andy Salnikov made changes -
            Attachment apdb-gcp4-table-compression.png [ 47380 ]
            Hide
            salnikov Andy Salnikov added a comment -

            Few more numbers for data size for each individual table (note that we have 3 replicas, size on disk should be divided by 3 to get one replica size). Cassandra compresses data on disk, ap_proto generates random payload for many fields which can affect compression efficiency. Also schema has multiple BLOB columns, calculation below assumes ~20 bytes per BLOB on average.

            Table Record count Size on disk Stored record size Schema size
            DiaObject ​6.6E9 ​12.46 TiB ​644 B 624 B
            DiaObjectLast   746 GiB   100 B
            DiaSource 6.6E9 10.24 TiB 530 B 504 B
            DiaForcedSource 32.73E9 3.01 TiB 31.4 B 60 B

            I did not count DiaObjectLast records (and cannot reconstruct it from logs).

            Stored data sizes seem to be consistent with the compression ratios on the plot above.

            One concern that I have is that ever-increasing number of DiaObjects that we read from database. I guess this is probably expected in the current model as there is no cutoff on the history of the DiaObjects. If this continues t grow linearly then timing for that will become important after several years.

            Another random thought - DiaObject and DiaForcedSource tables are write-only, but we are going to update DiaSource (detaching them from DiaObject and attaching to SSObject). Cassandra append-only data model could potentially make this more expensive when reading Sources back (which is already the slowest operation). It maybe worth researching an option when relation is not a part of the table itself but a separate very narrow table.

            Show
            salnikov Andy Salnikov added a comment - Few more numbers for data size for each individual table (note that we have 3 replicas, size on disk should be divided by 3 to get one replica size). Cassandra compresses data on disk, ap_proto generates random payload for many fields which can affect compression efficiency. Also schema has multiple BLOB columns, calculation below assumes ~20 bytes per BLOB on average. Table Record count Size on disk Stored record size Schema size DiaObject ​6.6E9 ​12.46 TiB ​644 B 624 B DiaObjectLast   746 GiB   100 B DiaSource 6.6E9 10.24 TiB 530 B 504 B DiaForcedSource 32.73E9 3.01 TiB 31.4 B 60 B I did not count DiaObjectLast records (and cannot reconstruct it from logs). Stored data sizes seem to be consistent with the compression ratios on the plot above. One concern that I have is that ever-increasing number of DiaObjects that we read from database. I guess this is probably expected in the current model as there is no cutoff on the history of the DiaObjects. If this continues t grow linearly then timing for that will become important after several years. Another random thought - DiaObject and DiaForcedSource tables are write-only, but we are going to update DiaSource (detaching them from DiaObject and attaching to SSObject). Cassandra append-only data model could potentially make this more expensive when reading Sources back (which is already the slowest operation). It maybe worth researching an option when relation is not a part of the table itself but a separate very narrow table.
            Hide
            salnikov Andy Salnikov added a comment -

            To summarize my findings before I switch to other tests:

            • Performance seems to scale reasonably, increasing cluster size two times makes things approximately twice faster (there is overhead on client side which stays constant of course).
            • As expected select time after 12 months stays constant for sources tables (which are slowest), but continues to grow for object table (because we store more and more objects per region).
            • Cassandra looks very stable, no errors or timeouts observed during whole test, but I had to tweak settings to allow larger limits for timeouts and bulk insert sizes.

            Closing this ticket.

            Show
            salnikov Andy Salnikov added a comment - To summarize my findings before I switch to other tests: Performance seems to scale reasonably, increasing cluster size two times makes things approximately twice faster (there is overhead on client side which stays constant of course). As expected select time after 12 months stays constant for sources tables (which are slowest), but continues to grow for object table (because we store more and more objects per region). Cassandra looks very stable, no errors or timeouts observed during whole test, but I had to tweak settings to allow larger limits for timeouts and bulk insert sizes. Closing this ticket.
            salnikov Andy Salnikov made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            salnikov Andy Salnikov made changes -
            Resolution Done [ 10000 ]
            Status In Progress [ 3 ] Done [ 10002 ]
            salnikov Andy Salnikov made changes -
            Remote Link This issue links to "Jupyter notebook (Web Link)" [ 27402 ]

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Watchers:
              Andy Salnikov, Colin Slater, Fritz Mueller
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.